I have a dataframe generated by a function:
Each time it's of different number of rows:
structure(list(a = c(1, 2, 3), b = c("er", "gd", "ku"), c = c(43,
453, 12)), .Names = c("a", "b", "c"), row.names = c(NA, -3L), class = c("tbl_df",
"tbl", "data.frame"))
structure(list(a = c(1, 2), b = c("er", "gd"), c = c(43, 453)), .Names = c("a",
"b", "c"), row.names = c(NA, -2L), class = c("tbl_df", "tbl",
"data.frame"))
I want to be able like in a while loop to control the number of rows to be less then n (n = 4, 100, 4242...) when I bind rows.
Please advise how to do this using functional programming without a while loop?
I mean sometimes you will get n = 10 and the df before bind_rows is 7 and after binding the last one it will be 20. It's ok, I want the number of rows to be min_k (k >= n)
Here is my while loop doing this:
b <- list()
total_rows <- 0
while(total_rows < 1000) {
df <- f_produce_rand_df()
b[[length(b) + 1]] <- df
total_rows <- total_rows + nrow(df)
}
I am trying to build a sequence data for a recommender system. I have built a cross-tabular data (Table 1) and Table 2 as shown below:
enter image description here
I have been trying to replace all the 1's in Table 1 by the "Grade" from the Table 2 in R.
Any insight/suggestion is greatly appreciated.
Instead of replacing the first one with second, the second table and directly changed to 'wide' with dcast
library(reshape2)
res <- dcast(df2, St.No. ~ Courses, value.var = 'Grade')[names(df1)]
res
# St.No. Math Phys Chem CS
#1 1 A B
#2 2 B B
#3 3 A A C
#4 4 B B D
If we need to replace the blanks with 0
res[res =='"] <- "0"
data
df1 <- data.frame(St.No. = 1:4, Math = c(0, 0, 1, 1), Phys = c(1, 1, 0, 1),
Chem = c(0, 1, 1, 0), CS = c(1, 0, 1, 1))
df2 <- data.frame(St.No. = rep(1:4, each = 4), Courses = rep(c("Math",
"Phys", "Chem", "CS"), 4),
Grade = c("", "A", "", "B", "", "B", "B", "",
"A", "", "A", "C", "B", "B", "", "D"),
stringsAsFactors = FALSE)
I have nested list data that needed to be in desired output representation, either matrix like object or just directly export these nested list as csv file. I tried several general approach to do this task, but exporting nested list is not going well, so I am looking for the solution that casting nested list to matrix like or tabular like object to hold data in desired way. Maybe I could hold nested list data in data.table, but not quite sure about this. Can anyone tell me how to do this sort of manipulation easily ? How can I achieve clean, well structured data representation for nested list data ? Any idea ? Thanks a lot
mini example :
output of custom function:
AcceptedList <- list(
A_accepted = data.frame(pos.start=c(1,6,16), pos.stop=c(4,12,23), pos.ID=c("A1","A2","A3"), pos.score=c(11,8,13)),
B_accepted = data.frame(pos.start=c(7,19,31), pos.stop=c(13,28,43), pos.ID=c("B3","B6","B7"), pos.score=c(12,5,7)),
C_accepted = data.frame(pos.start=c(5,21,36), pos.stop=c(11,29,42), pos.ID=c("C2","C4","C9"), pos.score=c(7,13,9))
)
RejectedList <- list(
A_rejected = data.frame(pos.start=c(6,25,40), pos.stop=c(12,33,49), pos.ID=c("A2","A5","A8"), pos.score=c(8,4,7)),
B_rejected = data.frame(pos.start=c(15,19,47), pos.stop=c(18,28,55), pos.ID=c("B4","B6","B9"), pos.score=c(10,5,14)),
C_rejected = data.frame(pos.start=c(13,21,36,53), pos.stop=c(19,29,42,67), pos.ID=c("C3","C4","C9","C12"), pos.score=c(4,13,9,17))
)
so I implement this function to further manipulate output one more step :
func <- function(mlist, threshold) {
res <- lapply(mlist, function(x) {
splt <- split(x, ifelse(x$pos.score >= threshold, "up", "down"))
})
return(res)
}
#example
.res_accepted <- func(AcceptedList, 9)
.res_rejected <- func(RejectedList, 9)
I have hard time how to case nested list .res_accepted, .res_rejected as matrix like object. Ideally exporting nested list as csv file is highly expected, but I failed to export them in desired way. How can I make this happen ?
ultimately, desired list of csv files with desired named as follows:
A_accepted_up.csv
A_accepted_down.csv
A_rejected_up.csv
A_rejected_down.csv
B_accepted_up.csv
B_accepted_down.csv
B_rejected_up.csv
B_rejected_down.csv
C_accepted_up.csv
C_accepted_down.csv
C_rejected_up.csv
C_rejected_down.csv
The point is, nested list returned by my custom functions, so I intend to either directly export them or cast them into matrix like object as well. Any idea for this sort of manipulation ? Thanks:)
This returns a data.frame DF of the data. No packages are used.
both <- do.call("rbind", c(AcceptedList, RejectedList))
cn <- c("letter", "accepted", "seq")
DF <- cbind(
read.table(text = chartr("_", ".", rownames(both)), sep = ".", col.names = cn),
both)
DF <- transform(DF, updown = ifelse(pos.score > 8, "up", "down"))
giving:
> DF
letter accepted seq pos.start pos.stop pos.ID pos.score updown
A_accepted.1 A accepted 1 1 4 A1 11 up
A_accepted.2 A accepted 2 6 12 A2 8 down
A_accepted.3 A accepted 3 16 23 A3 13 up
B_accepted.1 B accepted 1 7 13 B3 12 up
B_accepted.2 B accepted 2 19 28 B6 5 down
B_accepted.3 B accepted 3 31 43 B7 7 down
C_accepted.1 C accepted 1 5 11 C2 7 down
C_accepted.2 C accepted 2 21 29 C4 13 up
C_accepted.3 C accepted 3 36 42 C9 9 up
A_rejected.1 A rejected 1 6 12 A2 8 down
A_rejected.2 A rejected 2 25 33 A5 4 down
A_rejected.3 A rejected 3 40 49 A8 7 down
B_rejected.1 B rejected 1 15 18 B4 10 up
B_rejected.2 B rejected 2 19 28 B6 5 down
B_rejected.3 B rejected 3 47 55 B9 14 up
C_rejected.1 C rejected 1 13 19 C3 4 down
C_rejected.2 C rejected 2 21 29 C4 13 up
C_rejected.3 C rejected 3 36 42 C9 9 up
C_rejected.4 C rejected 4 53 67 C12 17 up
This will write DF out in separate files:
junk <- by(DF, DF[c("letter", "accepted", "updown")],
function(x) write.csv(x[-(1:3)],
sprintf("%s_%s_%s.csv", x$letter[1], x$accepted[1], x$updown[1])))
or this will write out the data frames in .res_accepted -- .res_rejected could be handled similarly:
junk <- lapply(names(.res_accepted), function(nm)
mapply(write.csv,
.res_accepted[[nm]],
paste0(nm, "_", names(.res_accepted[[nm]]), ".csv")))
Note: The poster changed the data after this answer already had appeared. The output above corresponds to the original data; however, it should also work for the revised data. The original data was:
AcceptedList <-
structure(list(foo_accepted = structure(list(pos.start = c(1,
6, 16), pos.stop = c(4, 12, 23), pos.ID = structure(1:3, .Label = c("A1",
"A2", "A3"), class = "factor"), pos.score = c(11, 8, 13)), .Names = c("pos.start",
"pos.stop", "pos.ID", "pos.score"), row.names = c(NA, -3L), class = "data.frame"),
bar_accepted = structure(list(pos.start = c(7, 19, 31), pos.stop = c(13,
28, 43), pos.ID = structure(1:3, .Label = c("B3", "B6", "B7"
), class = "factor"), pos.score = c(12, 5, 7)), .Names = c("pos.start",
"pos.stop", "pos.ID", "pos.score"), row.names = c(NA, -3L
), class = "data.frame"), cat_accepted = structure(list(pos.start = c(5,
21, 36), pos.stop = c(11, 29, 42), pos.ID = structure(1:3, .Label = c("C2",
"C4", "C9"), class = "factor"), pos.score = c(7, 13, 9)), .Names = c("pos.start",
"pos.stop", "pos.ID", "pos.score"), row.names = c(NA, -3L
), class = "data.frame")), .Names = c("foo_accepted", "bar_accepted",
"cat_accepted"))
RejectedList <-
structure(list(foo_rejected = structure(list(pos.start = c(6,
25, 40), pos.stop = c(12, 33, 49), pos.ID = structure(1:3, .Label = c("A2",
"A5", "A8"), class = "factor"), pos.score = c(8, 4, 7)), .Names = c("pos.start",
"pos.stop", "pos.ID", "pos.score"), row.names = c(NA, -3L), class = "data.frame"),
bar_rejected = structure(list(pos.start = c(15, 19, 47),
pos.stop = c(18, 28, 55), pos.ID = structure(1:3, .Label = c("B4",
"B6", "B9"), class = "factor"), pos.score = c(10, 5,
14)), .Names = c("pos.start", "pos.stop", "pos.ID", "pos.score"
), row.names = c(NA, -3L), class = "data.frame"), cat_rejected = structure(list(
pos.start = c(13, 21, 36, 53), pos.stop = c(19, 29, 42,
67), pos.ID = structure(c(2L, 3L, 4L, 1L), .Label = c("C12",
"C3", "C4", "C9"), class = "factor"), pos.score = c(4,
13, 9, 17)), .Names = c("pos.start", "pos.stop", "pos.ID",
"pos.score"), row.names = c(NA, -4L), class = "data.frame")),
.Names = c("foo_rejected",
"bar_rejected", "cat_rejected"))
I am trying to show how many complete observations there are per variabie ID without using the complete.cases package or any other package.
If I use na.omit to filter out the NA values, I will lose all of the IDs which might have ZERO complete cases.
In the end, I'd like a frequency table with two columns: ID and Number of Complete Observations
> length(unique(data$ID))
[1] 332
> head(data)
ID value
1 1 NA
2 1 NA
3 1 NA
4 1 NA
5 1 NA
6 1 NA
> dim(data)
[1] 772087 2
When I try to create my own function z - which counts non-NA values and apply that in the aggregate() function, the IDs with zero complete observations are left out. I should be left with 332 rows, not 323. How does one resolve this using base functions?
z <- function(x){
sum(!is.na(x))
}
aggregate(value ~ ID, data = data , FUN = "z")
> nrow(aggregate(isna ~ ID, data = data , FUN = "z"))
[1] 323
One of the ways to do this is using table:
df2 <- table(df$Id, !is.na(df$value))[,2]
data.frame(ID = names(df2), value = df2)
Data
structure(list(Id = c(1, 1, 1, 2, 2, 3, 3, 3, 3, 3, 4, 4), value = c(NA,
1, 1, 2, 2, NA, 3, NA, 3, 3, 4, 4)), .Names = c("Id", "value"
), row.names = c(NA, -12L), class = "data.frame")
Base R you can use your utility function like this:
stack(by(data$value, data$ID, FUN=function(x) sum(!is.na(x))))
you can directly use table for this purpose. Below is the sample code:
df1 <- structure(list(Id = c(1, 1, 1, 2, 2, 3, 3, 3, 3, 3, 4, 4), value = c(2,
1, 1, NA, NA, NA, 3, NA, 3, 3, 4, 4)), .Names = c("Id", "value"
), row.names = c(NA, -12L), class = "data.frame")
df2 <- as.data.frame.matrix(with(df1, table(Id, value)))
resultDf <- data.frame(Id=row.names(df2), count=apply(df2, 1, sum))
resultDf
The code makes a table of id and value. Then it just sums the non-na values from the table. Hope this is easy to understand and helps.
I have two lists of data frames: listA and listB. How to get a list of merged dataframes (listC)?
dfA1 <- data.frame(x1 = c("a", "b"), y1 = c(1, 2), row.names = c("1", "2"))
dfA2 <- data.frame(x1 = c("c", "d"), y1 = c(3, 4), row.names = c("1", "3"))
dfB1 <- data.frame(x2 = c("c", "d"), y2 = c(3, 4), row.names = c("1", "2"))
dfB2 <- data.frame(x2 = c("e", "f"), y2 = c(5, 6), row.names = c("2", "3"))
listA <- list(dfA1, dfA2) # first input list
listB <- list(dfB1, dfB2) # second input list
m1 <- merge(dfA1, dfB1, by = 0, all = T)
m2 <- merge(dfA2, dfB2, by = 0, all = T)
listC <- list(m1, m2) # desired output list
Found following solution:
listC <- mapply(function(x, y) merge(x, y, by = 0, all = T), x = listA, y = listB, SIMPLIFY = F)