How to convert intervals with value to individual position level in R - r

I struggle a bit with following problem:
I have table A (below) and I would like to merge/reduce/covert intervals defined in there to individual positions like in table B by calculating sum (values in table A) of overlapping positions in intervals (start and end of each interval in table A) if any or just give value if no overlapping positions or 0 if no interval for that position. I would prefer solution for that problem in R. I would really appreciate your help.
Table A
ID Start End Value
1 1 5 9
2 3 7 5
3 5 9 13
4 11 15 1
5 12 16 18
6 14 18 21
Convert to this Table B
Position Value
1 9
2 9
3 14
4 14
5 27
6 18
7 18
8 13
9 13
10 0
11 15
12 33
13 33
14 54
15 54
16 39
17 21
18 21

Not a very straight forward way but it gets the job done:
df<-structure(list(ID = 1:6, Start = c(1L, 3L, 5L, 11L, 12L, 14L),
End = c(5L, 7L, 9L, 15L, 16L, 18L),
Value = c(9L, 5L, 13L, 1L, 18L, 21L)), .Names = c("ID", "Start", "End", "Value"),
class = "data.frame", row.names = c(NA,
-6L))
# create list matrix for each grouping
s1<-lapply(1:6, function(i) {matrix(c(df[i,2]:df[i,3], rep(df[i,4], (df[i,3]-df[i,2]+1))), nrow = (df[i,3]-df[i,2])+1)})
s2<-as.data.frame(do.call(rbind, s1))
#sum all of the like positions
library(dplyr)
wgaps<-summarise(group_by(s2, V1), sum(V2))
#create sequence with no gaps in it and match
nogaps<-data.frame(Position=seq(min(wgaps$V1), max(wgaps$V1)))
nogaps<-left_join(nogaps, wgaps, by=c("Position"= "V1"))
names(nogaps)<-c("Position", "value") #rename
nogaps$value[is.na(nogaps$value)]<-0 #remove 0

Related

Tidyverse method for combining sets of columns based on a condition in the column names

Imagine I have the following columns (among others) in my dataframe (credit to Allan for creating the sample data):
20L, 15L), b_years = c(4L, 5L, 3L), b_months = 0:2, b_days = c(10L,
8L, 6L), c_years = 8:6, c_months = c(11L, 9L, 8L), c_days = c(26L,
19L, 18L)), class = "data.frame", row.names = c(NA, -3L))
df
#> a_years a_months a_days b_years b_months b_days c_years c_months c_days
#> 1 5 6 23 4 0 10 8 11 26
#> 2 4 7 20 5 1 8 7 9 19
#> 3 3 8 15 3 2 6 6 8 18
And I want to combine columns that start with the same grouping key (in this case the letter at the beginning, but in my data it's a longer expression) such that I get columns a_days, b_days, c_days and so on with values in eahc column equal to x_years * 365 + x_months * 30 + x_days, for each group (a, b, c, d, e and so on) of columns.
Is there a way to accomplish this all at once? Some combination of map() and mutate() comes to mind, or maybe using case_when(), but I can't quite figure it out. Thanks for any guidance you can offer!
You can do this with across inside transmute:
library(dplyr)
df %>%
transmute(across(contains("days"), ~ .x) +
across(contains("months"), ~ .x * 30) +
across(contains("years"), ~ .x * 365))
#> a_days b_days c_days
#> 1 2028 1470 3276
#> 2 1690 1863 2844
#> 3 1350 1161 2448
Sample data
df <- structure(list(a_years = 5:3, a_months = 6:8, a_days = c(23L,
20L, 15L), b_years = c(4L, 5L, 3L), b_months = 0:2, b_days = c(10L,
8L, 6L), c_years = 8:6, c_months = c(11L, 9L, 8L), c_days = c(26L,
19L, 18L)), class = "data.frame", row.names = c(NA, -3L))
df
#> a_years a_months a_days b_years b_months b_days c_years c_months c_days
#> 1 5 6 23 4 0 10 8 11 26
#> 2 4 7 20 5 1 8 7 9 19
#> 3 3 8 15 3 2 6 6 8 18
Created on 2022-09-29 with reprex v2.0.2

Multiple graphs with r

I have two datasets that have exactly the same columns and exactly the same products, but the first table contains the Forecast and the second one contains the actual sales.
I would like to create for each product a WoW line graph showing one line for sales and one line in another color for the FC. I am building an rmd html file and would like to have in this case 4 graphs on the page.
Unfortunately I don't have only 4 products in real life and I cannot write a code for every product.
Is there a way to make out of these two table four (or more) plots with WoW FC and Sales lines with a simple code?
Thank you!
Forecast:
Product
CW1
CW2
CW3
CW4
A
9
12
21
8
B
7
5
6
9
C
10
10
20
15
D
10
9
8
8
Actual sales:
Product
CW1
CW2
CW3
CW4
A
10
11
21
7
B
10
5
7
9
C
9
10
21
15
D
10
10
9
8
I think it would be best to bind your two data frames row-wise (after labelling each with a column to declare which original data frame it was in), then pivot them into long format so that all the CW values are in a single column, with a new column to label which "CW" they came from. This new data frame will contain all the information you need to create a simple line plot in ggplot2
library(ggplot2)
library(dplyr)
library(tidyr)
bind_rows(list(sales = sales, forecast = forecast), .id = "type") %>%
pivot_longer(cols = starts_with("CW")) %>%
ggplot(aes(name, value, color = type, group = type)) +
geom_line() +
scale_color_discrete(name = "") +
facet_wrap(.~Product, ncol = 2)
Data
forecast <- structure(list(Product = c("A", "B", "C", "D"), CW1 = c(9L, 7L,
10L, 10L), CW2 = c(12L, 5L, 10L, 9L), CW3 = c(21L, 6L, 20L, 8L
), CW4 = c(8L, 9L, 15L, 8L)), class = "data.frame", row.names = c(NA,
-4L))
sales <- structure(list(Product = c("A", "B", "C", "D"), CW1 = c(10L,
10L, 9L, 10L), CW2 = c(11L, 5L, 10L, 10L), CW3 = c(21L, 7L, 21L,
9L), CW4 = c(7L, 9L, 15L, 8L)), class = "data.frame", row.names = c(NA,
-4L))
forecast
#> Product CW1 CW2 CW3 CW4
#> 1 A 9 12 21 8
#> 2 B 7 5 6 9
#> 3 C 10 10 20 15
#> 4 D 10 9 8 8
sales
#> Product CW1 CW2 CW3 CW4
#> 1 A 10 11 21 7
#> 2 B 10 5 7 9
#> 3 C 9 10 21 15
#> 4 D 10 10 9 8

How to sum rows based on exact conditions on multiple columns and save edited rows in original dataset? [duplicate]

This question already has answers here:
Find nearest matches for each row and sum based on a condition
(4 answers)
Closed 3 years ago.
There are 3 parts to this problem:
1) I want to sum values in column b,c,d for any two adjacent rows which have the same values for columns(b,c,d)
2) I would like to keep values in other columns the same. (Some other column (eg. a) may contain character data.)
3) I would like to keep the changes by replacing the original value in columns b,c,d in the first row (of the 2 same rows) with the new values (the sums) and delete the second row(of the 2 same rows).
Time a b c d id
1 2014/10/11 A 40 20 10 1
2 2014/10/12 A 40 20 10 2
3 2014/10/13 B 9 10 9 3
4 2014/10/14 D 16 5 12 4
5 2014/10/15 D 1 6 5 5
6 2014/10/16 B 20 7 8 6
7 2014/10/17 B 20 7 8 7
8 2014/10/18 A 11 9 5 8
9 2014/10/19 C 31 20 23 9
Expected outcome:
Time a b c d id
1 2014/10/11 A 80 40 20 1 *
3 2014/10/13 B 9 10 9 3
4 2014/10/14 D 16 5 12 4
5 2014/10/15 D 1 6 5 5
6 2014/10/16 B 40 14 16 6 *
8 2014/10/18 A 11 9 5 8
9 2014/10/19 C 31 20 23 9
id 1 and 2 combined to become id 1; id 6 and 7 combined to become id 6.
Thank you. Any contribution is greatly appreciated.
Using dplyr functions along with data.table::rleid. To get same values for adjacent b, c and d columns we paste them and use rleid to create groups. For each group we sum the values at b, c and d columns and keep only the 1st row.
library(dplyr)
df %>%
mutate(temp_col = paste(b, c, d, sep = "-")) %>%
group_by(group = data.table::rleid(temp_col)) %>%
mutate_at(vars(b, c, d), sum) %>%
slice(1L) %>%
ungroup %>%
select(-temp_col, -group)
# Time a b c d id
# <fct> <fct> <int> <int> <int> <int>
#1 2014/10/11 A 80 40 20 1
#2 2014/10/13 B 9 10 9 3
#3 2014/10/14 D 16 5 12 4
#4 2014/10/15 D 1 6 5 5
#5 2014/10/16 B 40 14 16 6
#6 2014/10/18 A 11 9 5 8
#7 2014/10/19 C 31 20 23 9
data
df <- structure(list(Time = structure(1:9, .Label = c("2014/10/11",
"2014/10/12", "2014/10/13", "2014/10/14", "2014/10/15", "2014/10/16",
"2014/10/17", "2014/10/18", "2014/10/19"), class = "factor"),
a = structure(c(1L, 1L, 2L, 4L, 4L, 2L, 2L, 1L, 3L), .Label = c("A",
"B", "C", "D"), class = "factor"), b = c(40L, 40L, 9L, 16L,
1L, 20L, 20L, 11L, 31L), c = c(20L, 20L, 10L, 5L, 6L, 7L,
7L, 9L, 20L), d = c(10L, 10L, 9L, 12L, 5L, 8L, 8L, 5L, 23L
), id = 1:9), class = "data.frame", row.names = c("1", "2",
"3", "4", "5", "6", "7", "8", "9"))

Create a new data frame column that is a combination of other columns

I have 3 columns a , b ,c and I want to combine them into a new column with the help of column mood as the following :
if mod= 1 , data from a
if mod=2 , data from b
if mode=3, data from c
example
mode a b c
1 2 3 4
1 5 53 14
3 2 31 24
2 12 13 44
1 20 30 40
Output
mode a b c combine
1 2 3 4 2
1 5 53 14 5
3 2 31 24 24
2 12 13 44 13
1 20 30 40 20
We can use the row/column indexing to get the values from the dataset. Here, the row sequence (seq_len(nrow(df1))) and the column index ('mode') are cbinded to create a matrix to extract the corresponding values from the subset of dataset
df1$combine <- df1[2:4][cbind(seq_len(nrow(df1)), df1$mode)]
df1$combine
#[1] 2 5 24 13 20
data
df1 <- structure(list(mode = c(1L, 1L, 3L, 2L, 1L), a = c(2L, 5L, 2L,
12L, 20L), b = c(3L, 53L, 31L, 13L, 30L), c = c(4L, 14L, 24L,
44L, 40L)), class = "data.frame", row.names = c(NA, -5L))
Another solution in base R that works by converting "mode" to letters then extracting those values in the matching columns.
df1$combine <- diag(as.matrix(df1[, letters[df1$mode]]))
Also, two ways with dplyr(). Nested if_else :
library(dplyr)
df1 %>%
mutate(combine =
if_else(mode == 1, a,
if_else(mode == 2, b, c)
)
)
And case_when():
df1 %>% mutate(combine =
case_when(mode == 1 ~ a, mode == 2 ~ b, mode == 3 ~ c)
)

Handling ties in finding index of n th maximum value in R

I am working on a dataframe and trying to find the index of nth maximum value (n varies by a loop), however, in the columns I have tied values and the program throws an error. Below is a sample dataset. I am basically trying to generate a similar dataframe, but with only the index values of all the values in the column vector of the dataframe.
For the output DF, column 1 in the output DF will have index values of elements of Refer_1, so Output_DF[1,1] will have the index for highest value, while Output_DF[10,1] will have the index of lowest value. Below is the input DF.
Input
1 17
2 21
3 13
4 26
5 204
6 36
7 14
8 25
9 45
10 37
Output (index values)
5
9
10
6
4
8
2
1
7
3
I am currently using which, unlist and partial together to get the indexes, however, I am unable to rectify the error. Note that the ties can occur with any nth maximum value (not necessarily the column maxima).
which(Consolidated_data_new[,i]==unlist(sort(Consolidated_data_new[,i],partial=j)[j]))
Please note that I want the code to return only one value at a time, and handle the 2nd tied value in the next loop iteration.
Please help solve this.
Regards,
library(data.table)
DT<-structure(list(Refer_1 = c(11L, 15L, 7L, 19L, 104L, 24L, 11L,
22L, 39L, 19L), Refer_2 = c(17L, 21L, 13L, 25L, 204L, 36L, 14L,
25L, 45L, 37L)), .Names = c("Refer_1", "Refer_2"), row.names = c(NA,
-10L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x0000000000130788>)
DT[,lapply(.SD, order,decreasing=TRUE)]
Refer_1 Refer_2
1: 5 5
2: 9 9
3: 6 10
4: 8 6
5: 4 4
6: 10 8
7: 2 2
8: 1 1
9: 7 7
10: 3 3
Your comments suggest you are working with a dataframe that has more than one column and that you want an output dataframe that has the results of order with decreasing=TRUE applied to every column:
> DF[2] <- sample(1:300, 10)
> DF[3] <- sample(1:300, 10)
> DF
Input V2 V3
1 17 210 3
2 21 72 4
3 13 263 1
4 26 249 6
5 204 223 10
6 36 83 7
7 14 107 2
8 25 295 5
9 45 198 9
10 37 112 8
> ordDF <- as.data.frame(lapply(DF, order, decreasing=TRUE))
> names(ordDF) <- paste0("res", 1:length(DF) )
> ordDF
res1 res2 res3
1 5 8 4
2 9 3 9
3 10 4 2
4 6 5 7
5 4 1 10
6 8 9 8
7 2 10 1
8 1 7 6
9 7 6 3
10 3 2 5
> dput(ordDF)
structure(list(res1 = c(5L, 9L, 10L, 6L, 4L, 8L, 2L, 1L, 7L,
3L), res2 = c(8L, 3L, 4L, 5L, 1L, 9L, 10L, 7L, 6L, 2L), res3 = c(4L,
9L, 2L, 7L, 10L, 8L, 1L, 6L, 3L, 5L)), .Names = c("res1", "res2",
"res3"), row.names = c(NA, -10L), class = "data.frame")

Resources