how to multiply multiple df columns - r

I have a df with a number of columns.
I want to multiply each of the column using a fixed constant.
I am looking for the best possible strategy to achieve this using purrr (I am still trying to get my head around lamp etc etc)
library(tidyverse)
library(lubridate)
df1 <- data.frame(
date = ymd(c("2019-02-01", "2019-02-02", "2019-02-03", "2019-02-04",
"2019-02-05")),
x = c(1, 2, 3, 4, 5),
y = c(2, 3, 4, 5, 6),
z = c(3, 4, 5, 6, 7)
)
The constants to multiply each of the column is as follows:
c(10, 20, 30)
This is the output I expect:
data.frame(
date = ymd(c("2019-02-01", "2019-02-02", "2019-02-03", "2019-02-04",
"2019-02-05")),
x = c(10, 20, 30, 40, 50),
y = c(40, 60, 80, 100, 120),
z = c(90, 120, 150, 180, 210)
)

We can use map2 from purrr (part of the tidyverse) to achieve this.
df1[2:4] <- map2(df1[2:4], c(10, 20, 30), ~.x * .y)
df1
# date x y z
# 1 2019-02-01 10 40 90
# 2 2019-02-02 20 60 120
# 3 2019-02-03 30 80 150
# 4 2019-02-04 40 100 180
# 5 2019-02-05 50 120 210
The base R equivalent is mapply.
df1[2:4] <- mapply(FUN = function(x, y) x * y, df1[2:4], c(10, 20, 30), SIMPLIFY = FALSE)

Related

Match two equal-sized data.frames and then filter results on a third

I have the following three data.frame:
area1 <- data.frame(ua = c(1, 2, 3),
sub_ua1 = c(0, 100, 0),
sub_ua2 = c(100, 100, 100),
sub_ua3 = c(100, 0, 0))
area2 <- data.frame(ua = c(1, 2, 3),
sub_ua1 = c(100, 100, 0),
sub_ua2 = c(100, 100, 0),
sub_ua3 = c(100, 0, 0))
df <- data.frame(ua = c(rep(1, 5), rep(2, 4), rep(3, 7)),
subua = c(rep("sub_ua1", 3), "sub_ua2", "sub_ua3",
"sub_ua1", "sub_ua1", "sub_ua2", "sub_ua3",
"sub_ua1", c(rep("sub_ua2", 2)), rep("sub_ua3", 4)),
value = c(rep(2, 3), rep(4, 3), rep(2, 2), rep(1, 8)))
What I'm trying to do is, based on column ua in dfs area_1 and area_2, filter only sub_ua (1 to 3) that have a match of 100 in each df. For example, the first value of sub_ua2 is 100 in both area_1 and area_2. This is a "sub_ua" I want.
Then, after having this list of "sub_ua" per "ua", filter only them on df to obtain the filtered value.
The results should be:
For ua == 1, get both sub_ua2 and sub_ua3
For ua == 2, get both sub_ua1 and sub_ua2
For ua == 3, get sub_ua2
EDIT:
I was using the following approach to obtain a data.frame of rows and columns indices:
library(prodlim)
# Indices for data frame 1 and 2 for values = 100
indices_1 <- which(area1 == 100, arr.ind = TRUE)
indices_2 <- which(area2 == 100, arr.ind = TRUE)
# Rows where indices are matched between the two data frame indices
indices_rows <- na.omit(row.match(as.data.frame(indices_1), as.data.frame(indices_2)))
# Row-column indices where both data frames have values of 100
indices_2[indices_rows, ]
I just don't know how to use this to filter in the final dataset df
If I understood correctly this should work:
area1 <- data.frame(ua = c(1, 2, 3),
sub_ua1 = c(0, 100, 0),
sub_ua2 = c(100, 100, 100),
sub_ua3 = c(100, 0, 0))
area2 <- data.frame(ua = c(1, 2, 3),
sub_ua1 = c(100, 100, 0),
sub_ua2 = c(100, 100, 0),
sub_ua3 = c(100, 0, 0))
library(dplyr)
library(tidyr)
area1 %>%
left_join(area2, by = "ua", suffix = c(".area1",".area2")) %>%
pivot_longer(cols = -ua,names_to = "var",values_to = "value") %>%
separate(col = var,into = c("var","area"),sep = "\\.") %>%
pivot_wider(names_from = area,values_from = value) %>%
filter(area1 == 100, area2 == 100) %>%
select(-starts_with("area"))
# A tibble: 4 x 2
ua var
<dbl> <chr>
1 1 sub_ua2
2 1 sub_ua3
3 2 sub_ua1
4 2 sub_ua2

Bar plot in loop for each observation

Here is sample data where ID is a categorical variable.
ID <- c(12, 34, 560, 45, 235)
W1 <- c(0, 5, 7, 6, 0)
W2 <- c(7, 8, 9, 5, 2)
W3 <- c(0, 0, 3, 5, 9)
df <- data.frame(ID, W1, W2, W3)
df$ID <- as.factor(df$ID)
I want to draw five bar plots for each of these IDs using the frequency data for the three weeks W1:W3. In the actual dataset, I have 30+ weeks and around 150 IDs, hence the intention here is to do this efficiently. Nothing fancy, but ggplot would be ideal as I would need to manipulate some aesthetics.
How to do this using loop and save the images in one file(pdf)?
Thanks for your help!
This sort of problem is usually a data reformating problem. See reshaping data.frame from wide to long format. After reshaping the data, the plot is faceted by ID, avoiding loops.
library(ggplot2)
ID <- c(12, 34, 560, 45, 235)
W1 <- c(0, 5, 7, 6, 0)
W2 <- c(7, 8, 9, 5, 2)
W3 <- c(0, 0, 3, 5, 9)
df <- data.frame(ID, W1, W2, W3)
df$ID <- as.factor(df$ID)
df[-1] <- lapply(df[-1], as.integer)
df |>
tidyr::pivot_longer(-ID, names_to = "Week", values_to = "Frequency") |>
ggplot(aes(Week, Frequency, fill = Week)) +
geom_col() +
scale_y_continuous(breaks = scales::pretty_breaks()) +
facet_wrap(~ ID) +
theme_bw(base_size = 16)
Created on 2022-09-30 with reprex v2.0.2
Edit
If there is a mix of week numbers with 1 and 2 digits, the lexicographic order is not the numbers' order. For instance, after W1 comes W11, not W2. Package stringr function str_sort sorts by numbers when argument numeric = TRUE.
In the example below I reuse the data changing W2 to W11. The correct bars order should therefore be W1, W3, W11.
library(ggplot2)
library(stringr)
ID <- c(12, 34, 560, 45, 235)
W1 <- c(0, 5, 7, 6, 0)
W11 <- c(7, 8, 9, 5, 2)
W3 <- c(0, 0, 3, 5, 9)
df <- data.frame(ID, W1, W11, W3)
df$ID <- as.factor(df$ID)
df[-1] <- lapply(df[-1], as.integer)
df |>
tidyr::pivot_longer(-ID, names_to = "Week", values_to = "Frequency") |>
dplyr::mutate(Week = factor(Week, levels = str_sort(unique(Week), numeric = TRUE))) |>
ggplot(aes(Week, Frequency, fill = Week)) +
geom_col() +
scale_y_continuous(breaks = scales::pretty_breaks()) +
facet_wrap(~ ID) +
theme_bw(base_size = 16)
Created on 2022-10-01 with reprex v2.0.2

How to apply a function to a data.table subset by multiple columns in R?

I have a data table with counts for changes for multiple groups. For example:
input <- data.table(from = c("A", "A", "A", "B", "B", "B", "A", "A", "A", "B", "B", "B"),
to = c(letters[1:6], letters[1:6]),
from_N = c(100, 100, 100, 50, 50, 50, 60, 60 ,60, 80, 80, 80),
to_N = c(10, 20, 40, 5, 5, 15, 10, 5, 10, 20, 5, 10),
group = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2))
How can I calculate the total for each change across groups? I can do this using a for loop, for example:
out <- list()
for (i in 1:length(unique(input$from))){
sub <- input[from == unique(input$from)[i]]
out2 <- list()
for (j in 1:length(unique(sub$to))){
sub2 <- sub[to == unique(sub$to)[j]]
out2[[j]] <- data.table(from = sub2$from[1],
to = sub2$to[1],
from_N = sum(sub2$from_N),
to_N = sum(sub2$to_N))
print(unique(sub$to)[j])
}
out[[i]] <- do.call("rbind", out2)
print(unique(input$from)[i])
}
output <- do.call("rbind", out)
However, the data table I need to apply this to is very large, and I therefore need to maximise performance. Is there a data.table method? Any help will be greatly appreciated!
Perhaps I've overlooked something, but it seems you're just after:
library(data.table)
setDT(input)[, .(from_N = sum(from_N), to_N = sum(to_N)), by = .(from, to)]
Output:
from to from_N to_N
1: A a 160 20
2: A b 160 25
3: A c 160 50
4: B d 130 25
5: B e 130 10
6: B f 130 25
An option with dplyr
library(dplyr)
input %>%
group_by(from, to) %>%
summarise_at(vars(ends_with('_N')), sum)
Or in data.table
library(data.table)
setDT(input)[, lapply(.SD, sum), by = .(from, to), .SDcols = patterns('_N$')]

how to add together dataframes within a list but only for matching dates

I have a list of dataframes that I want to consolidate these dataframes into one data frame. I am looking to solve two problems:
How to add together the columns
How to only include common dates across all the dfs withing the list
This is what I have:
library(tidyverse)
library(lubridate)
df1 <- data.frame(
date = ymd(c("2019-02-01", "2019-02-02", "2019-02-03", "2019-02-04",
"2019-02-05")),
x = c(1, 2, 3, 4, 5),
y = c(2, 3, 4, 5, 6),
z = c(3, 4, 5, 6, 7)
)
df2 <- data.frame(
date = ymd(c("2019-02-01", "2019-02-02", "2019-02-04", "2019-02-05")),
x = c(1, 2, 3, 4),
y = c(2, 3, 4, 5),
z = c(3, 4, 5, 6)
)
df3 <- data.frame(
date = ymd(c("2019-02-01", "2019-02-02", "2019-02-03", "2019-02-04")),
x = c(1, 2, 3, 4),
y = c(2, 3, 4, 5),
z = c(3, 4, 5, 6)
)
dfl <- list(df1, df2, df3)
This is the output I am looking for:
data.frame(
date = ymd(c("2019-02-01", "2019-02-02", "2019-02-04")),
x = c(3, 6, 11),
y = c(6, 9, 14),
z = c(9, 12, 17)
)
I have tried inner_join and tried looping through the list but it got too complicated and I still didn't manage to land on the answer.
Is there a more cleaner way to get to the final answer
How about this?
bind_rows(dfl) %>%
group_by(date) %>%
mutate(n = 1) %>%
summarise_all(sum) %>%
filter(n == length(dfl)) %>%
select(-n)
## A tibble: 3 x 4
# date x y z
# <date> <dbl> <dbl> <dbl>
#1 2019-02-01 3 6 9
#2 2019-02-02 6 9 12
#3 2019-02-04 11 14 17
This assumes that there are no duplicate dates in a single data.frame of dfl.

Heterogeneous list to dataframe in R

I got a list (from a JSON extracted with rjson) which defines in a bizarre way the different values of an element for given x and y values (like to draw a graph line). What I look for is to produce a data.frame which contains only the values, with the possible values of x and y in row names and col headers.
In other words, I try to obtain this kind of data frame:
y1 y2
x1 v11 v12
x2 v21 v22
And the list I got as entry is like:
[
[
x1,
[
[y1, v11],
[y2, v12]
]
],
[
x2,
[
[y1, v21],
[y2, v22]
]
]
]
It is a list of lists; each inner list contains two elements: one x value and a list of list. Those most-inner lists have two elements, a y value and a v value, which represents the value of the object for the x of its parent and for the y with it.
I hope my explanation is not too confuse. I have made some unsuccessful attemps with ldply or matrix(unlist(... I look for a way to transform my list without having to pick one by one each value in a double for loop.
Thanks for reading and for any help you can provide.
[EDIT]
Here is the dput of my data:
list(list(20, list(c(1, 224), c(3, 330), c(5, 436), c(10, 701
), c(20, 1231), c(30, 1231))), list(10, list(c(1, 154), c(3,
207), c(5, 366), c(10, 631), c(20, 631), c(30, 631))), list(5,
list(c(1, 119), c(3, 225), c(5, 331), c(10, 331), c(20, 331
), c(30, 331))), list(1, list(c(1, 91), c(3, 91), c(5, 91
), c(10, 91), c(20, 91), c(30, 91))))
In this example, 20, 10, 5, 3, 1 are supposed to be the future x of the dataframe and 1, 3, 5, 10, 20, 30 the future y. The rest are values of the object.
The package jsonliteis able to simplify data structures when converting from JSON to R. I am not sure if rjson offers something similar. Here I am using this to round trip from R to JSON and back, giving me a matrix for y_i and v_ij:
foo <- list(list(20, list(c(1, 224), c(3, 330), c(5, 436), c(10, 701), c(20, 1231), c(30, 1231))),
list(10, list(c(1, 154), c(3, 207), c(5, 366), c(10, 631), c(20, 631), c(30, 631))),
list(5, list(c(1, 119), c(3, 225), c(5, 331), c(10, 331), c(20, 331), c(30, 331))),
list(1, list(c(1, 91), c(3, 91), c(5, 91), c(10, 91), c(20, 91), c(30, 91))))
bar <- jsonlite::fromJSON(jsonlite::toJSON(foo))
baz <- Reduce(rbind,lapply(bar, function(x) t(x[[2]])[2, ]))
colnames(baz) <- bar[[1]][[2]][,1]
rownames(baz) <- unlist(lapply(bar, function(x) x[[1]]))
baz
#> 1 3 5 10 20 30
#> 20 224 330 436 701 1231 1231
#> 10 154 207 366 631 631 631
#> 5 119 225 331 331 331 331
#> 1 91 91 91 91 91 91

Resources