Write list of lists to separate CSV files - r

I have a list (414 elements) which contains other lists of different lengths (ranging from 0 to 9). Each of those sublists has different numbers of rows and columns.
Some of the sublists are of length 1 like the one below:
tables_list[[1]]
[,1] [,2]
[1,] "ID Number" "ABCD"
[2,] "Code" "1239463"
[3,] "Version" "1"
[4,] "Name" "ABC"
[5,] "Status" "Open"
[6,] "Currency" "USD"
[7,] "Average" "No"
[8,] "FX Rate" "2.47"
Other sublists are of length 2 or higher like the one below:
tables_list[[17]]
[[1]]
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] "" "" "USD" "Balance" "Movement in" "Aggregate" "Overall" "" "Overall"
[2,] "" "" "" "brought forward" "year" "annual" "aggregate" "" "funded account"
[3,] "" "" "" "from previous" "" "information" "adjustments" "" ""
[4,] "" "" "" "year end" "" "" "" "" ""
[5,] "" "" "" "1" "2" "3" "4" "" "5"
[6,] "12" "Value 1" "" "0" "3,275,020" "3,275,020" "" "0" "3,275,020"
[7,] "13" "Value 2" "" "0" "0" "0" "" "0" "0"
[8,] "14" "Value 3" "" "0" "8,267,862" "8,267,862" "" "0" "8,267,862"
[9,] "15" "Value 4" "" "0" "(590,073,321)" "(590,073,321)" "" "0" "(590,073,321)"
[10,] "16" "Value 5" "" "0" "0" "0" "" "0" "0"
[11,] "17" "Value 6" "" "0" "0" "0" "" "0" "0"
[12,] "18" "Value 7" "" "0" "0" "0" "" "0" "0"
[13,] "19" "Value 8" "" "0" "0" "0" "" "0" "0"
[14,] "20" "Value 9" "" "0" "(459,222,782)" "(459,222,782)" "" "0" "(459,222,782)"
[[2]]
[,1] [,2] [,3] [,4]
[1,] "Theme" "Year" "Comment" "Created"
[2,] "Line 17 Column 2" "N/A" "Amounts are calculated according to recent standards" "XXXXXXXXXXXX"
[3,] "" "" "paid by XXXXXXXXXXXXX" ""
I am trying to export each of those lists to an individual csv file but I cannot figure out a way to do so. Does anyone have any ideas on how to approach this? I tried using mapply but I keep getting the following error:
Error in file(file, ifelse(append, "a", "w")) :
invalid 'description' argument
In addition: Warning message:
In if (file == "") file <- stdout() else if (is.character(file)) { :
Error in file(file, ifelse(append, "a", "w")) :
invalid 'description' argument

First you flatten the list appropriately, then you can loop over it in a the regular manner.
flattenlist <- function(x){
morelists <- sapply(x, function(xprime) class(xprime)[1]=="list")
out <- c(x[!morelists], unlist(x[morelists], recursive=FALSE))
if (sum(morelists)) {
Recall(out)
} else {
return(out)
}
}
l <- list(a=list(1:2, b=2:4),
b=c("A", "B", "C"),
z=1,
m=matrix(4:1, 2),
d=data.frame(x=1:4, y=c(1, 3, 2, 4))
)
l.f <- flattenlist(l)
n <- paste0("robj_", names(l.f), ".csv")
sapply(1:length(l.f), function(x) write.csv(l.f[[x]], file=n[x]))

Related

why data.frame in R change the datatype unexpectedly? [duplicate]

This question already has answers here:
How to convert a data frame column to numeric type?
(18 answers)
Closed 2 years ago.
===========================================================================
updates 2/20/2021:
I just look into the problem and found the problem is in the second file,
Sex is originally coded as "F" and "M". When I change it with:
subject.info[subject.info$Sex=='F',]$Sex=1
subject.info[subject.info$Sex=='M',]$Sex=2
the weird thing is R directly changed 1 to "1". And what even more weird is it looks like numeric values when you print it.
My question is why this happens, not how to convert the type of values in a data.frame. I don't understand why someone insists it is a duplicated question, even though similar answers can solve the problem.
=================================================================================
I have two text files. One file is .txt and the other is .csv.
The .csv file has one additional column (with NA values). All the others are the same.
When I read those files with the commands:
subject.info = read.table(paste(data_dir, "outd01_all_subject_info.txt", sep = slash), header=TRUE)
subject.info = read.csv("data_d01_features/outd01_all_subject_info2.txt", sep = ',', header=TRUE, stringsAsFactors = F)
The dataframe subject.info looks the same, but when I run:
as.matrix(subject.info)
All the data in the second file are converted to strings:
SUBJID Sex age trauma_age ptsd
[1,] "600039015048" "2" "11" NA "0"
[2,] "600110937794" "1" "10" NA "0"
[3,] "600129552715" "1" "11" " 8" "2"
[4,] "600210241146" "1" "18" "16" "2"
[5,] "600294620965" "1" "13" NA "0"
[6,] "600409285352" "2" "16" "15" "1"
[7,] "600460215379" "1" "10" NA "0"
[8,] "600547831711" "1" "10" " 6" "1"
[9,] "600561317124" "2" "19" "19" "1"
[10,] "600635899969" "2" "11" NA "0"
[11,] "600647003585" "1" "18" NA "0"
[12,] "600682103788" "1" "18" "15" "2"
[13,] "600689706588" "1" "16" "15" "2"
[14,] "600747749665" "2" " 9" " 7" "1"
This does not happen for the first file:
SUBJID Sex age ptsd
[1,] 600039015048 2 10 0
[2,] 600110937794 1 9 0
[3,] 600129552715 1 10 2
[4,] 600210241146 1 17 2
[5,] 600294620965 1 13 0
[6,] 600409285352 2 15 1
[7,] 600460215379 1 8 0
[8,] 600547831711 1 8 1
[9,] 600561317124 2 19 1
[10,] 600635899969 2 11 0
[11,] 600647003585 1 19 0
[12,] 600682103788 1 18 2
[13,] 600689706588 1 15 2
[14,] 600747749665 2 8 1
Is this due to the NA values? But when I replace NAs with 0 in the second file, the problem still exists:
SUBJID Sex age trauma_age ptsd
[1,] "600039015048" "2" "11" " 0" "0"
[2,] "600110937794" "1" "10" " 0" "0"
[3,] "600129552715" "1" "11" " 8" "2"
[4,] "600210241146" "1" "18" "16" "2"
[5,] "600294620965" "1" "13" " 0" "0"
[6,] "600409285352" "2" "16" "15" "1"
[7,] "600460215379" "1" "10" " 0" "0"
[8,] "600547831711" "1" "10" " 6" "1"
[9,] "600561317124" "2" "19" "19" "1"
[10,] "600635899969" "2" "11" " 0" "0"
[11,] "600647003585" "1" "18" " 0" "0"
[12,] "600682103788" "1" "18" "15" "2"
[13,] "600689706588" "1" "16" "15" "2"
[14,] "600747749665" "2" " 9" " 7" "1"
And this problem still exists if I convert the second file to .csv file, nor if I use read.table, or read.csv2
From the output it look that column trauma_age is of class character which is turning everything into character. Check class(subject.info$trauma_age).
Turn it into numeric by doing :
subject.info$trauma_age <- as.numeric(subject.info$trauma_age)
and then try converting to matrix i.e as.matrix(subject.info).
You can also use type.convert to convert data automatically to respective types without worrying about column names.
subject.info <- type.convert(subject.info, as.is = TRUE)

How to bind rows without losing those with character(0)?

I have a list like L (comes from a vector splitting).
L <- strsplit(c("1 5 9", "", "3 7 11", ""), " ")
# [[1]]
# [1] "1" "5" "9"
#
# [[2]]
# character(0)
#
# [[3]]
# [1] "3" "7" "11"
#
# [[4]]
# character(0)
When I do an ordinary rbind as follows, I'm losing all the character(0) rows.
do.call(rbind, L)
# [,1] [,2] [,3]
# [1,] "1" "5" "9"
# [2,] "3" "7" "11"
Do I always have to do a lapply like the following or have I missed something?
do.call(rbind, lapply(L, function(x)
if (length(x) == 0) rep("", 3) else x))
# [,1] [,2] [,3]
# [1,] "1" "5" "9"
# [2,] "" "" ""
# [3,] "3" "7" "11"
# [4,] "" "" ""
Base R answers are preferred.
If you use lapply you don't have to worry about length so you can skip the rep part it will automatically be recycled across columns.
do.call(rbind, lapply(L, function(x) if (length(x) == 0) "" else x))
# [,1] [,2] [,3]
#[1,] "1" "5" "9"
#[2,] "" "" ""
#[3,] "3" "7" "11"
#[4,] "" "" ""
Another option using same logic as #NelsonGon we can replace the empty lists with blank and then rbind.
L[lengths(L) == 0] <- ""
do.call(rbind, L)
# [,1] [,2] [,3]
#[1,] "1" "5" "9"
#[2,] "" "" ""
#[3,] "3" "7" "11"
#[4,] "" "" ""
Maybe this roundabout using data.table suits you:
L <- data.table::tstrsplit(c("1 5 9", "", "3 7 11", ""), " ", fill="")
t(do.call(rbind,L))
With plyr then proceed with replacement. Since OP asked for base R, see below.
plyr::ldply(L,rbind)
1 2 3
1 1 5 9
2 <NA> <NA> <NA>
3 3 7 11
4 <NA> <NA> <NA>
A less efficient base R way:
L <- strsplit(c("1 5 9", "", "3 7 11", ""), " ")
L[lapply(L,length)==0]<-"Miss"
res<-Reduce(rbind,L)
res[res=="Miss"]<-""
Result:
[,1] [,2] [,3]
init "1" "5" "9"
"" "" ""
"3" "7" "11"
"" "" ""
That is the defined behavior for scenarios like that. As written in ?rbind:
For cbind (rbind), vectors of zero length (including NULL) are ignored
unless the result would have zero rows (columns), for S compatibility.
(Zero-extent matrices do not occur in S3 and are not ignored in R.)
When you inspect your elements, you see that it is true:
length(L[[1]])
[1] 3
length(L[[2]])
[1] 0
However, as you see, multiple workarounds are possible.
We can use stri_list2matrix in a simple way
library(stringi)
stri_list2matrix(L, byrow = TRUE, fill = "")
# [,1] [,2] [,3]
#[1,] "1" "5" "9"
#[2,] "" "" ""
#[3,] "3" "7" "11"
#[4,] "" "" ""

Row number in dataframe based on multiple parameters in R

I wish to find the row number, based on multiple parameters. I have made this test matrix:
data=
[,1] [,2] [,3]
[1,] "1" "a" "0"
[2,] "2" "b" "0"
[3,] "3" "c" "0"
[4,] "4" "a" "0"
[5,] "1" "b" "0"
[6,] "2" "c" "0"
[7,] "3" "a" "0"
[8,] "4" "b" "0"
Then I want to get the row number where
data[,1]==1 and data[,2]=='b'

Trying to build a large table in R

I am trying to build a large table in R. Yes I have heard of the table() function - in fact, I've used it several times in the code below - but I am building this because I do not want to type table() 20 times a day. I plan on just exporting this using xtable + knitr. The reason this is useful is that for those of us who have to repeatedly tabulate data, this would save a lot of time. Unfortunately, there is something wrong with the loop down here:
ESRD <- rep(c("Y", "N"), each=10)
DIABETES <- rep(c("Y", "N", "Y", "N"), c(5, 5, 5, 5))
BLAH <- rep(c("Y", "N"), each=10)
categoricalvariables <- data.frame(ESRD, DIABETES, BLAH)
descriptives <- function(VARIABLEMATRIX){
desc <- matrix(0, ncol=4, nrow=2*ncol(VARIABLEMATRIX) + ncol(VARIABLEMATRIX))
for (i in 1:ncol(VARIABLEMATRIX)){
matper <- matrix(0, nrow=dim(table(VARIABLEMATRIX[ ,i])), ncol=1)
for (i in 1:dim(table(VARIABLEMATRIX[ ,i]))){
matper[i, ] <- paste(round(prop.table(table(VARIABLEMATRIX[ ,i]))[i]*100, 2), "%")
}
matcount <- matrix(0, nrow=dim(table(VARIABLEMATRIX[ ,i])), ncol=1)
for (i in 1:dim(table(VARIABLEMATRIX[ ,i]))){
matcount[i, ] <- table(VARIABLEMATRIX[ ,i])[i]
}
desc[((3*i)-2), ] <- c(colnames(VARIABLEMATRIX)[i], "", "", "")
desc[((3*i)-1):(3*i), ] <- cbind("", names(table(VARIABLEMATRIX[ ,i])), matcount[ ,1], matper[ ,1])
return(desc)
}
}
descriptives(categoricalvariables)
The output I am getting is (clearly there is a bug but I am not sure what is wrong):
[,1] [,2] [,3] [,4]
[1,] "0" "0" "0" "0"
[2,] "0" "0" "0" "0"
[3,] "0" "0" "0" "0"
[4,] "DIABETES" "" "" ""
[5,] "" "N" "10" "50 %"
[6,] "" "Y" "10" "50 %"
[7,] "0" "0" "0" "0"
[8,] "0" "0" "0" "0"
[9,] "0" "0" "0" "0"
The expected output should be:
[,1] [,2] [,3] [,4]
[1,] "ESRD" "" "" ""
[2,] "" "N" "10" "50 %"
[3,] "" "Y" "10" "50 %"
[4,] "DIABETES" "" "" ""
[5,] "" "N" "10" "50 %"
[6,] "" "Y" "10" "50 %"
[7,] "BLAH" "" "" ""
[8,] "" "N" "10" "50 %"
[9,] "" "Y" "10" "50 %"
Here's one option:
desc <- function(x) {
af <- table(x)
rf <- prop.table(af) * 100
out <- cbind(Absolute=af, `Relative(%)`=rf)
dimnames(out) <- setNames(dimnames(out), c('Values', 'Frequency'))
out
}
lapply(categoricalvariables, desc)
#$ESRD
# Frequency
#Values Absolute Relative(%)
# N 10 50
# Y 10 50
#
#$DIABETES
# Frequency
#Values Absolute Relative(%)
# N 10 50
# Y 10 50
#
#$BLAH
# Frequency
#Values Absolute Relative(%)
# N 10 50
# Y 10 50
If you really want a character matrix
tmp <- lapply(categoricalvariables, desc)
out <- do.call(rbind, lapply(names(tmp), function(x) {
rbind(c(x, "", "", ""), cbind("", rownames(tmp[[x]]), tmp[[x]]))
}))
out <- unname(rbind(c("", "", "Abs.Freq", "Rel.Freq"), out))
out
# [,1] [,2] [,3] [,4]
# [1,] "" "" "Abs.Freq" "Rel.Freq"
# [2,] "ESRD" "" "" ""
# [3,] "" "N" "10" "50"
# [4,] "" "Y" "10" "50"
# [5,] "DIABETES" "" "" ""
# [6,] "" "N" "10" "50"
# [7,] "" "Y" "10" "50"
# [8,] "BLAH" "" "" ""
# [9,] "" "N" "10" "50"
#[10,] "" "Y" "10" "50"

How can I prevent leading spaces when transforming integer columns to character in R?

I have a data.frame with character and integer columns. I want to transform them all into characters, but I get unwanted leading spaces for the numeric columns:
> example <- data.frame(a=1:10,b=1:10,c=rep("foo",10))
> apply(example,2,format,trim=T)
a b c
[1,] " 1" " 1" "foo"
[2,] " 2" " 2" "foo"
[3,] " 3" " 3" "foo"
[4,] " 4" " 4" "foo"
[5,] " 5" " 5" "foo"
[6,] " 6" " 6" "foo"
[7,] " 7" " 7" "foo"
[8,] " 8" " 8" "foo"
[9,] " 9" " 9" "foo"
[10,] "10" "10" "foo"
The trim=T parameter is ignored apparently. This only occurs in the presence of the character column 'c', i.e. it works fine if 'c' is not present (apply(example[,-3],...)).
If I remember correctly, it's because of as.matrix, but you can bypass this by using sapply:
> sapply(example, format, trim = TRUE)
a b c
[1,] "1" "1" "foo"
[2,] "2" "2" "foo"
[3,] "3" "3" "foo"
[4,] "4" "4" "foo"
[5,] "5" "5" "foo"
[6,] "6" "6" "foo"
[7,] "7" "7" "foo"
[8,] "8" "8" "foo"
[9,] "9" "9" "foo"
[10,] "10" "10" "foo"
If you're okay with a character matrix as an output (you seem to be based on your use of apply, try):
do.call(cbind, lapply(example, as.character))
This produces:
a b c
[1,] "1" "1" "foo"
[2,] "2" "2" "foo"
[3,] "3" "3" "foo"
[4,] "4" "4" "foo"
[5,] "5" "5" "foo"
[6,] "6" "6" "foo"
[7,] "7" "7" "foo"
[8,] "8" "8" "foo"
[9,] "9" "9" "foo"
[10,] "10" "10" "foo"
As it says in ?apply, the first argument is coerced to a matrix. In this case, it converts it to a character matrix because of column c. The call to as.matrix creates the leading spaces. The subsequent calls to format do nothing because the data are already character.
> as.matrix(example)
a b c
[1,] " 1" " 1" "foo"
[2,] " 2" " 2" "foo"
[3,] " 3" " 3" "foo"
[4,] " 4" " 4" "foo"
[5,] " 5" " 5" "foo"
[6,] " 6" " 6" "foo"
[7,] " 7" " 7" "foo"
[8,] " 8" " 8" "foo"
[9,] " 9" " 9" "foo"
[10,] "10" "10" "foo"
Without column c, it's converted to an integer matrix, and format converts the integers to character.
> as.matrix(example[,-3])
a b
[1,] 1 1
[2,] 2 2
[3,] 3 3
[4,] 4 4
[5,] 5 5
[6,] 6 6
[7,] 7 7
[8,] 8 8
[9,] 9 9
[10,] 10 10
Better to simply use lapply:
example <- data.frame(a=1:10,b=1:10,c=rep("foo",10))
example[] <- lapply(example, format, trim=TRUE)
# use sapply if you really want a matrix
example <- sapply(example, format, trim=TRUE)

Resources