Export data frames from list to txt file - r

I have a question in exporting data frame from list into txt file. I found some solutions, but it was only for vectors. Here is one example:
dataframe1 <- data.frame(a= c(1,2,3,4,5), b= c(1,1,1,1,1))
dataframe2 <- data.frame(a= c(5,5,5), b= c(1,1,1))
mylist <- list(dataframe1, dataframe2)
I would like that the txt file looks like this:
$dataframe1
a b
1 1
2 1
3 1
4 1
5 1
$dataframe2
a b
5 1
5 1
5 1
Thank you for the help.

Say your list is named:
mylist<-structure(list(dataframe1 = structure(list(a = c(1, 2, 3, 4,
5), b = c(1, 1, 1, 1, 1)), .Names = c("a", "b"), row.names = c(NA,
-5L), class = "data.frame"), dataframe2 = structure(list(a = c(5,
5, 5), b = c(1, 1, 1)), .Names = c("a", "b"), row.names = c(NA,
-3L), class = "data.frame")), .Names = c("dataframe1", "dataframe2"
))
You can try:
con<-file("temp.csv",open="at")
Map(function(x,y) {cat(file=con,y,"\n");write.table(x,file=con,quote=FALSE,row.names=FALSE)},
mylist,names(mylist))
close(con)
The above will write the files on the file temp.csv. You have to give names to your list if you want it to work.
Alternatively, if you are ok with the print method, you can just redirect the standard output to a file:
sink("temp.csv")
print(mylist)
sink(NULL)

Related

Control number of rows when binding dataframes with different number of rows?

I have a dataframe generated by a function:
Each time it's of different number of rows:
structure(list(a = c(1, 2, 3), b = c("er", "gd", "ku"), c = c(43,
453, 12)), .Names = c("a", "b", "c"), row.names = c(NA, -3L), class = c("tbl_df",
"tbl", "data.frame"))
structure(list(a = c(1, 2), b = c("er", "gd"), c = c(43, 453)), .Names = c("a",
"b", "c"), row.names = c(NA, -2L), class = c("tbl_df", "tbl",
"data.frame"))
I want to be able like in a while loop to control the number of rows to be less then n (n = 4, 100, 4242...) when I bind rows.
Please advise how to do this using functional programming without a while loop?
I mean sometimes you will get n = 10 and the df before bind_rows is 7 and after binding the last one it will be 20. It's ok, I want the number of rows to be min_k (k >= n)
Here is my while loop doing this:
b <- list()
total_rows <- 0
while(total_rows < 1000) {
df <- f_produce_rand_df()
b[[length(b) + 1]] <- df
total_rows <- total_rows + nrow(df)
}

Building sequence data for a recommender system- replacing cross-tabular matrix with a variable value

I am trying to build a sequence data for a recommender system. I have built a cross-tabular data (Table 1) and Table 2 as shown below:
enter image description here
I have been trying to replace all the 1's in Table 1 by the "Grade" from the Table 2 in R.
Any insight/suggestion is greatly appreciated.
Instead of replacing the first one with second, the second table and directly changed to 'wide' with dcast
library(reshape2)
res <- dcast(df2, St.No. ~ Courses, value.var = 'Grade')[names(df1)]
res
# St.No. Math Phys Chem CS
#1 1 A B
#2 2 B B
#3 3 A A C
#4 4 B B D
If we need to replace the blanks with 0
res[res =='"] <- "0"
data
df1 <- data.frame(St.No. = 1:4, Math = c(0, 0, 1, 1), Phys = c(1, 1, 0, 1),
Chem = c(0, 1, 1, 0), CS = c(1, 0, 1, 1))
df2 <- data.frame(St.No. = rep(1:4, each = 4), Courses = rep(c("Math",
"Phys", "Chem", "CS"), 4),
Grade = c("", "A", "", "B", "", "B", "B", "",
"A", "", "A", "C", "B", "B", "", "D"),
stringsAsFactors = FALSE)

How to cast nested list to matrix like or tabular like object?

I have nested list data that needed to be in desired output representation, either matrix like object or just directly export these nested list as csv file. I tried several general approach to do this task, but exporting nested list is not going well, so I am looking for the solution that casting nested list to matrix like or tabular like object to hold data in desired way. Maybe I could hold nested list data in data.table, but not quite sure about this. Can anyone tell me how to do this sort of manipulation easily ? How can I achieve clean, well structured data representation for nested list data ? Any idea ? Thanks a lot
mini example :
output of custom function:
AcceptedList <- list(
A_accepted = data.frame(pos.start=c(1,6,16), pos.stop=c(4,12,23), pos.ID=c("A1","A2","A3"), pos.score=c(11,8,13)),
B_accepted = data.frame(pos.start=c(7,19,31), pos.stop=c(13,28,43), pos.ID=c("B3","B6","B7"), pos.score=c(12,5,7)),
C_accepted = data.frame(pos.start=c(5,21,36), pos.stop=c(11,29,42), pos.ID=c("C2","C4","C9"), pos.score=c(7,13,9))
)
RejectedList <- list(
A_rejected = data.frame(pos.start=c(6,25,40), pos.stop=c(12,33,49), pos.ID=c("A2","A5","A8"), pos.score=c(8,4,7)),
B_rejected = data.frame(pos.start=c(15,19,47), pos.stop=c(18,28,55), pos.ID=c("B4","B6","B9"), pos.score=c(10,5,14)),
C_rejected = data.frame(pos.start=c(13,21,36,53), pos.stop=c(19,29,42,67), pos.ID=c("C3","C4","C9","C12"), pos.score=c(4,13,9,17))
)
so I implement this function to further manipulate output one more step :
func <- function(mlist, threshold) {
res <- lapply(mlist, function(x) {
splt <- split(x, ifelse(x$pos.score >= threshold, "up", "down"))
})
return(res)
}
#example
.res_accepted <- func(AcceptedList, 9)
.res_rejected <- func(RejectedList, 9)
I have hard time how to case nested list .res_accepted, .res_rejected as matrix like object. Ideally exporting nested list as csv file is highly expected, but I failed to export them in desired way. How can I make this happen ?
ultimately, desired list of csv files with desired named as follows:
A_accepted_up.csv
A_accepted_down.csv
A_rejected_up.csv
A_rejected_down.csv
B_accepted_up.csv
B_accepted_down.csv
B_rejected_up.csv
B_rejected_down.csv
C_accepted_up.csv
C_accepted_down.csv
C_rejected_up.csv
C_rejected_down.csv
The point is, nested list returned by my custom functions, so I intend to either directly export them or cast them into matrix like object as well. Any idea for this sort of manipulation ? Thanks:)
This returns a data.frame DF of the data. No packages are used.
both <- do.call("rbind", c(AcceptedList, RejectedList))
cn <- c("letter", "accepted", "seq")
DF <- cbind(
read.table(text = chartr("_", ".", rownames(both)), sep = ".", col.names = cn),
both)
DF <- transform(DF, updown = ifelse(pos.score > 8, "up", "down"))
giving:
> DF
letter accepted seq pos.start pos.stop pos.ID pos.score updown
A_accepted.1 A accepted 1 1 4 A1 11 up
A_accepted.2 A accepted 2 6 12 A2 8 down
A_accepted.3 A accepted 3 16 23 A3 13 up
B_accepted.1 B accepted 1 7 13 B3 12 up
B_accepted.2 B accepted 2 19 28 B6 5 down
B_accepted.3 B accepted 3 31 43 B7 7 down
C_accepted.1 C accepted 1 5 11 C2 7 down
C_accepted.2 C accepted 2 21 29 C4 13 up
C_accepted.3 C accepted 3 36 42 C9 9 up
A_rejected.1 A rejected 1 6 12 A2 8 down
A_rejected.2 A rejected 2 25 33 A5 4 down
A_rejected.3 A rejected 3 40 49 A8 7 down
B_rejected.1 B rejected 1 15 18 B4 10 up
B_rejected.2 B rejected 2 19 28 B6 5 down
B_rejected.3 B rejected 3 47 55 B9 14 up
C_rejected.1 C rejected 1 13 19 C3 4 down
C_rejected.2 C rejected 2 21 29 C4 13 up
C_rejected.3 C rejected 3 36 42 C9 9 up
C_rejected.4 C rejected 4 53 67 C12 17 up
This will write DF out in separate files:
junk <- by(DF, DF[c("letter", "accepted", "updown")],
function(x) write.csv(x[-(1:3)],
sprintf("%s_%s_%s.csv", x$letter[1], x$accepted[1], x$updown[1])))
or this will write out the data frames in .res_accepted -- .res_rejected could be handled similarly:
junk <- lapply(names(.res_accepted), function(nm)
mapply(write.csv,
.res_accepted[[nm]],
paste0(nm, "_", names(.res_accepted[[nm]]), ".csv")))
Note: The poster changed the data after this answer already had appeared. The output above corresponds to the original data; however, it should also work for the revised data. The original data was:
AcceptedList <-
structure(list(foo_accepted = structure(list(pos.start = c(1,
6, 16), pos.stop = c(4, 12, 23), pos.ID = structure(1:3, .Label = c("A1",
"A2", "A3"), class = "factor"), pos.score = c(11, 8, 13)), .Names = c("pos.start",
"pos.stop", "pos.ID", "pos.score"), row.names = c(NA, -3L), class = "data.frame"),
bar_accepted = structure(list(pos.start = c(7, 19, 31), pos.stop = c(13,
28, 43), pos.ID = structure(1:3, .Label = c("B3", "B6", "B7"
), class = "factor"), pos.score = c(12, 5, 7)), .Names = c("pos.start",
"pos.stop", "pos.ID", "pos.score"), row.names = c(NA, -3L
), class = "data.frame"), cat_accepted = structure(list(pos.start = c(5,
21, 36), pos.stop = c(11, 29, 42), pos.ID = structure(1:3, .Label = c("C2",
"C4", "C9"), class = "factor"), pos.score = c(7, 13, 9)), .Names = c("pos.start",
"pos.stop", "pos.ID", "pos.score"), row.names = c(NA, -3L
), class = "data.frame")), .Names = c("foo_accepted", "bar_accepted",
"cat_accepted"))
RejectedList <-
structure(list(foo_rejected = structure(list(pos.start = c(6,
25, 40), pos.stop = c(12, 33, 49), pos.ID = structure(1:3, .Label = c("A2",
"A5", "A8"), class = "factor"), pos.score = c(8, 4, 7)), .Names = c("pos.start",
"pos.stop", "pos.ID", "pos.score"), row.names = c(NA, -3L), class = "data.frame"),
bar_rejected = structure(list(pos.start = c(15, 19, 47),
pos.stop = c(18, 28, 55), pos.ID = structure(1:3, .Label = c("B4",
"B6", "B9"), class = "factor"), pos.score = c(10, 5,
14)), .Names = c("pos.start", "pos.stop", "pos.ID", "pos.score"
), row.names = c(NA, -3L), class = "data.frame"), cat_rejected = structure(list(
pos.start = c(13, 21, 36, 53), pos.stop = c(19, 29, 42,
67), pos.ID = structure(c(2L, 3L, 4L, 1L), .Label = c("C12",
"C3", "C4", "C9"), class = "factor"), pos.score = c(4,
13, 9, 17)), .Names = c("pos.start", "pos.stop", "pos.ID",
"pos.score"), row.names = c(NA, -4L), class = "data.frame")),
.Names = c("foo_rejected",
"bar_rejected", "cat_rejected"))

count non-NA values and group by variable

I am trying to show how many complete observations there are per variabie ID without using the complete.cases package or any other package.
If I use na.omit to filter out the NA values, I will lose all of the IDs which might have ZERO complete cases.
In the end, I'd like a frequency table with two columns: ID and Number of Complete Observations
> length(unique(data$ID))
[1] 332
> head(data)
ID value
1 1 NA
2 1 NA
3 1 NA
4 1 NA
5 1 NA
6 1 NA
> dim(data)
[1] 772087 2
When I try to create my own function z - which counts non-NA values and apply that in the aggregate() function, the IDs with zero complete observations are left out. I should be left with 332 rows, not 323. How does one resolve this using base functions?
z <- function(x){
sum(!is.na(x))
}
aggregate(value ~ ID, data = data , FUN = "z")
> nrow(aggregate(isna ~ ID, data = data , FUN = "z"))
[1] 323
One of the ways to do this is using table:
df2 <- table(df$Id, !is.na(df$value))[,2]
data.frame(ID = names(df2), value = df2)
Data
structure(list(Id = c(1, 1, 1, 2, 2, 3, 3, 3, 3, 3, 4, 4), value = c(NA,
1, 1, 2, 2, NA, 3, NA, 3, 3, 4, 4)), .Names = c("Id", "value"
), row.names = c(NA, -12L), class = "data.frame")
Base R you can use your utility function like this:
stack(by(data$value, data$ID, FUN=function(x) sum(!is.na(x))))
you can directly use table for this purpose. Below is the sample code:
df1 <- structure(list(Id = c(1, 1, 1, 2, 2, 3, 3, 3, 3, 3, 4, 4), value = c(2,
1, 1, NA, NA, NA, 3, NA, 3, 3, 4, 4)), .Names = c("Id", "value"
), row.names = c(NA, -12L), class = "data.frame")
df2 <- as.data.frame.matrix(with(df1, table(Id, value)))
resultDf <- data.frame(Id=row.names(df2), count=apply(df2, 1, sum))
resultDf
The code makes a table of id and value. Then it just sums the non-na values from the table. Hope this is easy to understand and helps.

Merge two lists with dataframes into one list with merged dataframes

I have two lists of data frames: listA and listB. How to get a list of merged dataframes (listC)?
dfA1 <- data.frame(x1 = c("a", "b"), y1 = c(1, 2), row.names = c("1", "2"))
dfA2 <- data.frame(x1 = c("c", "d"), y1 = c(3, 4), row.names = c("1", "3"))
dfB1 <- data.frame(x2 = c("c", "d"), y2 = c(3, 4), row.names = c("1", "2"))
dfB2 <- data.frame(x2 = c("e", "f"), y2 = c(5, 6), row.names = c("2", "3"))
listA <- list(dfA1, dfA2) # first input list
listB <- list(dfB1, dfB2) # second input list
m1 <- merge(dfA1, dfB1, by = 0, all = T)
m2 <- merge(dfA2, dfB2, by = 0, all = T)
listC <- list(m1, m2) # desired output list
Found following solution:
listC <- mapply(function(x, y) merge(x, y, by = 0, all = T), x = listA, y = listB, SIMPLIFY = F)

Resources