Performance indices for unequal datasets in R

Performance indices for unequal datasets in R - r

I wanted to do the performance indices in R. My data looks like this (example):
enter image description here
I want to ignore the comparison of values in Time 2 and 4 in data frame 1 and then compare it with the available set of observed data. I know how to develop the equation for the performance indicators (R2, RMSE, IA, etc.), but I am not sure how to ignore the data in the simulated data frame when corresponding observed data is not available for comparison.

Perhaps just do a left join, and compare the columns directly?
library(dplyr)
left_join(d2,d1 %>% rename(simData=Data), by="Time")
Output:
Time Data simData
<dbl> <dbl> <dbl>
1 1 57 52
2 3 88 78
3 5 19 23
Input:
d1 = structure(list(Time = c(1, 2, 3, 4, 5), Data = c(52, 56, 78,
56, 23)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-5L))
d2 = structure(list(Time = c(1, 3, 5), Data = c(57, 88, 19)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -3L))

Related

Combine ifelse two conditions and loop

I have a liste of dataframes (file1, file2, ..., file 72). For each dataframe I want to create one variable containing information from another dataframe based on two conditions.
The idea is simple:
condition 1: if file*$countryid equals source$country, and
condition 2: if file*$year is higher than source$starting but lower than source$ending, then if true I want to create a column file*$rank with the value in source$rank
I have been trying code lines like this but this code does not go through all lines in source:
file1$rank<-ifelse(file1$countryid=source$countryid & file1$year>source$starting & file1$year<source$ending,source$rank,NA)
In addition I would like to implement this within a loop to avoid iterating manually through all these dataframes:
dflist<-Filter(is.data.frame, mget(ls()))
dflist<-function(df,x){df$rank<-ifelse(df$countryid=source$countryid & df$year>source$starting & df$year<source$ending,source$rank,NA))
Here is an example of the data I have.
Thank you!
> dput(file1)
structure(list(id = c(1, 2, 3), countryid = c(10, 10, 13), year = c(1948,
1954, 1908)), row.names = c(NA, -3L), class = c("tbl_df", "tbl",
"data.frame"))
dput(file2)
structure(list(id = c(1, 2, 3), countryid = c(13, 10, 13), year = c(1907,
1908, 1907)), row.names = c(NA, -3L), class = c("tbl_df", "tbl",
"data.frame"))
> dput(source)
structure(list(country = c(13, 13, 13, 10, 10, 10), rank = c(1,
2, 3, 1, 2, 3), starting = c(1885, 1909, 1940, 1902, 1907, 1931
), ending = c(1908, 1939, 1960, 1906, 1930, 1960)), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))

We can use a non-equi join after getting all the file\\d+ datasets into a list
library(data.table)
out <- lapply(mget(ls(pattern = '^file\\d+$')), function(dat)
setDT(dat)[, year := as.integer(year)][as.data.table(source), rank := i.rank,
on = .(countryid = country, year > starting, year < ending)])
-output
out
#$file1
# id countryid year rank
#1: 1 10 1948 3
#2: 2 10 1954 3
#3: 3 13 1908 NA
#$file2
# id countryid year rank
#1: 1 13 1907 1
#2: 2 10 1908 2
#3: 3 13 1907 1
if it needs to update the original objects, use list2env
list2env(out, .GlobalEnv)

Multiply 2 very large data frames in R

I have 2 dataframes in R as below
Data Frame 1
structure(list(X1 = c(1, 4, 3), X2 = c(2, 1, 2), X3 = c(3, 1,
1)), class = "data.frame", row.names = c(NA, -3L))
Data Frame 2
structure(list(X1 = c(0.5, 0.1), X2 = c(0.7, 0.2), X3 = c(0.3,
0.2)), class = "data.frame", row.names = c(NA, -2L))
I want to multiply each row of DF1 with every row of DF2 and perform some calculation as below. This is a sort of matrix multiplication along with additional calculations
After matrix multiplication, I will calculate 1/(1+exp(-x)) for every cell in resultant matrix
and lastly, take the column sum of the matrix
The above dataset is just a dummy set. In actual, DF1 has 1.1 million rows while DF2 has 65000 rows.
While doing matrix multiplication, I get error
cannot allocate vector of Size 560 GB
Is there any alternative to this. Also, I am looking for time effective solution due to large data frames.
May be Data table ?
Thanks,

use mutate_if by subtracting from another data frame

I would like to do (more or less) the following
dplyr::mutate_if(tmp, is.numeric, function(x) x-df[3,])
in effect this should subtract at every x a value from df. The problem I have is that it should only use the matching column number, i.e. tmp[x,y] - df[3,y].
However what's happening is that it loops over the df[3,] vector for every x, irrespective of column position.
Is there any way to make this work with mutate_if by indexing the column somehow, which would be my preferred solution?
here is an example:
tmp is:
tmp <- structure(list(x = c(1, 1, 1, 1),
y = c(2, 2, 2, 2)),
row.names = c(NA, -4L), class = c("tbl_df", "tbl", "data.frame"))
df (actually a matrix) is:
df <- structure(c(1L, 2L, 3L, 2L, 3L, 4L),
.Dim = 3:2, .Dimnames = list(NULL, c("x", "y")))
now when I apply mutate it returns:
structure(list(x = c(-2, -3, -2, -3),
y = c(-1, -2, -1, -2)),
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -4L))
but I want it to be:
structure(list(x = c(-2, -2, -2, -2),
y = c(-2, -2, -2, -2)),
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -4L))
I hope that makes it clearer

We can use purrr:
df1<-as.data.frame(df)
as_tibble(purrr::map2(tmp[,purrr::map_lgl(tmp,is.numeric)],df1[3,],function(x,y) x-y))
This gives us:
# A tibble: 4 x 2
x y
<dbl> <dbl>
1 -2 -2
2 -2 -2
3 -2 -2
4 -2 -2

This isn't a perfect solution, but it will get you what you want (if my understanding is correct), and then you will have to play with the formatting. I don't quite understand why you have an entire data frame for df if you only care about the 3rd row. I don't know how to index the column using dplyr::mutate_if either; that would be useful to know!
Since you want the columns to match, you are effectively trying to subtract each row of tmp from a set row of df. For loops and sapply() are good for row-wise subtraction.
sapply(1:nrow(tmp), function(x) tmp[x, ] - df[3, ]) %>%
as.data.frame() %>%
t()
## x y
## V1 -2 -2
## V2 -2 -2
## V3 -2 -2
## V4 -2 -2

Control number of rows when binding dataframes with different number of rows?

I have a dataframe generated by a function:
Each time it's of different number of rows:
structure(list(a = c(1, 2, 3), b = c("er", "gd", "ku"), c = c(43,
453, 12)), .Names = c("a", "b", "c"), row.names = c(NA, -3L), class = c("tbl_df",
"tbl", "data.frame"))
structure(list(a = c(1, 2), b = c("er", "gd"), c = c(43, 453)), .Names = c("a",
"b", "c"), row.names = c(NA, -2L), class = c("tbl_df", "tbl",
"data.frame"))
I want to be able like in a while loop to control the number of rows to be less then n (n = 4, 100, 4242...) when I bind rows.
Please advise how to do this using functional programming without a while loop?
I mean sometimes you will get n = 10 and the df before bind_rows is 7 and after binding the last one it will be 20. It's ok, I want the number of rows to be min_k (k >= n)
Here is my while loop doing this:
b <- list()
total_rows <- 0
while(total_rows < 1000) {
df <- f_produce_rand_df()
b[[length(b) + 1]] <- df
total_rows <- total_rows + nrow(df)
}

How to cast nested list to matrix like or tabular like object?

I have nested list data that needed to be in desired output representation, either matrix like object or just directly export these nested list as csv file. I tried several general approach to do this task, but exporting nested list is not going well, so I am looking for the solution that casting nested list to matrix like or tabular like object to hold data in desired way. Maybe I could hold nested list data in data.table, but not quite sure about this. Can anyone tell me how to do this sort of manipulation easily ? How can I achieve clean, well structured data representation for nested list data ? Any idea ? Thanks a lot
mini example :
output of custom function:
AcceptedList <- list(
A_accepted = data.frame(pos.start=c(1,6,16), pos.stop=c(4,12,23), pos.ID=c("A1","A2","A3"), pos.score=c(11,8,13)),
B_accepted = data.frame(pos.start=c(7,19,31), pos.stop=c(13,28,43), pos.ID=c("B3","B6","B7"), pos.score=c(12,5,7)),
C_accepted = data.frame(pos.start=c(5,21,36), pos.stop=c(11,29,42), pos.ID=c("C2","C4","C9"), pos.score=c(7,13,9))
)
RejectedList <- list(
A_rejected = data.frame(pos.start=c(6,25,40), pos.stop=c(12,33,49), pos.ID=c("A2","A5","A8"), pos.score=c(8,4,7)),
B_rejected = data.frame(pos.start=c(15,19,47), pos.stop=c(18,28,55), pos.ID=c("B4","B6","B9"), pos.score=c(10,5,14)),
C_rejected = data.frame(pos.start=c(13,21,36,53), pos.stop=c(19,29,42,67), pos.ID=c("C3","C4","C9","C12"), pos.score=c(4,13,9,17))
)
so I implement this function to further manipulate output one more step :
func <- function(mlist, threshold) {
res <- lapply(mlist, function(x) {
splt <- split(x, ifelse(x$pos.score >= threshold, "up", "down"))
})
return(res)
}
#example
.res_accepted <- func(AcceptedList, 9)
.res_rejected <- func(RejectedList, 9)
I have hard time how to case nested list .res_accepted, .res_rejected as matrix like object. Ideally exporting nested list as csv file is highly expected, but I failed to export them in desired way. How can I make this happen ?
ultimately, desired list of csv files with desired named as follows:
A_accepted_up.csv
A_accepted_down.csv
A_rejected_up.csv
A_rejected_down.csv
B_accepted_up.csv
B_accepted_down.csv
B_rejected_up.csv
B_rejected_down.csv
C_accepted_up.csv
C_accepted_down.csv
C_rejected_up.csv
C_rejected_down.csv
The point is, nested list returned by my custom functions, so I intend to either directly export them or cast them into matrix like object as well. Any idea for this sort of manipulation ? Thanks:)

This returns a data.frame DF of the data. No packages are used.
both <- do.call("rbind", c(AcceptedList, RejectedList))
cn <- c("letter", "accepted", "seq")
DF <- cbind(
read.table(text = chartr("_", ".", rownames(both)), sep = ".", col.names = cn),
both)
DF <- transform(DF, updown = ifelse(pos.score > 8, "up", "down"))
giving:
> DF
letter accepted seq pos.start pos.stop pos.ID pos.score updown
A_accepted.1 A accepted 1 1 4 A1 11 up
A_accepted.2 A accepted 2 6 12 A2 8 down
A_accepted.3 A accepted 3 16 23 A3 13 up
B_accepted.1 B accepted 1 7 13 B3 12 up
B_accepted.2 B accepted 2 19 28 B6 5 down
B_accepted.3 B accepted 3 31 43 B7 7 down
C_accepted.1 C accepted 1 5 11 C2 7 down
C_accepted.2 C accepted 2 21 29 C4 13 up
C_accepted.3 C accepted 3 36 42 C9 9 up
A_rejected.1 A rejected 1 6 12 A2 8 down
A_rejected.2 A rejected 2 25 33 A5 4 down
A_rejected.3 A rejected 3 40 49 A8 7 down
B_rejected.1 B rejected 1 15 18 B4 10 up
B_rejected.2 B rejected 2 19 28 B6 5 down
B_rejected.3 B rejected 3 47 55 B9 14 up
C_rejected.1 C rejected 1 13 19 C3 4 down
C_rejected.2 C rejected 2 21 29 C4 13 up
C_rejected.3 C rejected 3 36 42 C9 9 up
C_rejected.4 C rejected 4 53 67 C12 17 up
This will write DF out in separate files:
junk <- by(DF, DF[c("letter", "accepted", "updown")],
function(x) write.csv(x[-(1:3)],
sprintf("%s_%s_%s.csv", x$letter[1], x$accepted[1], x$updown[1])))
or this will write out the data frames in .res_accepted -- .res_rejected could be handled similarly:
junk <- lapply(names(.res_accepted), function(nm)
mapply(write.csv,
.res_accepted[[nm]],
paste0(nm, "_", names(.res_accepted[[nm]]), ".csv")))
Note: The poster changed the data after this answer already had appeared. The output above corresponds to the original data; however, it should also work for the revised data. The original data was:
AcceptedList <-
structure(list(foo_accepted = structure(list(pos.start = c(1,
6, 16), pos.stop = c(4, 12, 23), pos.ID = structure(1:3, .Label = c("A1",
"A2", "A3"), class = "factor"), pos.score = c(11, 8, 13)), .Names = c("pos.start",
"pos.stop", "pos.ID", "pos.score"), row.names = c(NA, -3L), class = "data.frame"),
bar_accepted = structure(list(pos.start = c(7, 19, 31), pos.stop = c(13,
28, 43), pos.ID = structure(1:3, .Label = c("B3", "B6", "B7"
), class = "factor"), pos.score = c(12, 5, 7)), .Names = c("pos.start",
"pos.stop", "pos.ID", "pos.score"), row.names = c(NA, -3L
), class = "data.frame"), cat_accepted = structure(list(pos.start = c(5,
21, 36), pos.stop = c(11, 29, 42), pos.ID = structure(1:3, .Label = c("C2",
"C4", "C9"), class = "factor"), pos.score = c(7, 13, 9)), .Names = c("pos.start",
"pos.stop", "pos.ID", "pos.score"), row.names = c(NA, -3L
), class = "data.frame")), .Names = c("foo_accepted", "bar_accepted",
"cat_accepted"))
RejectedList <-
structure(list(foo_rejected = structure(list(pos.start = c(6,
25, 40), pos.stop = c(12, 33, 49), pos.ID = structure(1:3, .Label = c("A2",
"A5", "A8"), class = "factor"), pos.score = c(8, 4, 7)), .Names = c("pos.start",
"pos.stop", "pos.ID", "pos.score"), row.names = c(NA, -3L), class = "data.frame"),
bar_rejected = structure(list(pos.start = c(15, 19, 47),
pos.stop = c(18, 28, 55), pos.ID = structure(1:3, .Label = c("B4",
"B6", "B9"), class = "factor"), pos.score = c(10, 5,
14)), .Names = c("pos.start", "pos.stop", "pos.ID", "pos.score"
), row.names = c(NA, -3L), class = "data.frame"), cat_rejected = structure(list(
pos.start = c(13, 21, 36, 53), pos.stop = c(19, 29, 42,
67), pos.ID = structure(c(2L, 3L, 4L, 1L), .Label = c("C12",
"C3", "C4", "C9"), class = "factor"), pos.score = c(4,
13, 9, 17)), .Names = c("pos.start", "pos.stop", "pos.ID",
"pos.score"), row.names = c(NA, -4L), class = "data.frame")),
.Names = c("foo_rejected",
"bar_rejected", "cat_rejected"))

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Performance indices for unequal datasets in R - r

Related

Combine ifelse two conditions and loop

Multiply 2 very large data frames in R

use mutate_if by subtracting from another data frame

Control number of rows when binding dataframes with different number of rows?

How to cast nested list to matrix like or tabular like object?

Categories

Resources