How to repeat the rows based on value - oracle11g

The second column value wise return the row
Ex:
col_1 col_2
================
A 1
B 3
C 2
Result is:
col_1 col_2
================
A 1
B 3
B 3
B 3
C 2
C 2

You may use this query.
WITH yourtable (col_1, col_2)
AS (SELECT 'A', 1 FROM DUAL
UNION ALL
SELECT 'A', 3 FROM DUAL
UNION ALL
SELECT 'C', 2 FROM DUAL)
--The data above is for test purpose only
SELECT col_1, col_2
FROM yourtable
CROSS JOIN ( SELECT ROWNUM n
FROM DUAL
CONNECT BY LEVEL <= (SELECT MAX (col_2) FROM yourtable))
WHERE n <= col_2
ORDER BY col_1;
DEMO

Yet another option:
SQL> with yourtable (col_1, col_2) as
2 (select 'A', 1 from dual union all
3 select 'B', 3 from dual union all
4 select 'C', 2 from dual
5 )
6 select col_1, col_2
7 from yourtable,
8 table(cast(multiset(select level from dual
9 connect by level <= col_2)
10 as sys.odcinumberlist))
11 order by col_1;
C COL_2
- ----------
A 1
B 3
B 3
B 3
C 2
C 2
6 rows selected.
SQL>

Related

How to combine elements in a complicated list in R?

I am trying to combine elements in a list in different levels. For example, I have a list generated as below,
df1<-list(data.frame(f1= c(1:3),f2=c("a","b","c")))
df2<-list(data.frame(f1= c(4:6),f2=c("d","e","f")))
df3<-list(data.frame(f1= c(7:9),f2=c("x","y","z")))
list1 <- list("table1","comment1",df1)
list2 <- list("table2","comment2",df2)
list3 <- list("table3","comment3",df3)
list <- list(list1,list2,list3)
I want to combine the elements in the list and get a tibble like this,
table_name value
1 table1 a
2 table1 b
3 table1 c
4 table2 d
5 table2 e
6 table2 f
7 table3 x
8 table3 y
9 table3 z
I don't want to get the table by plucking one by one. Is there a simple way to do it?
Thanks!
Try the code below
do.call(
rbind,
lapply(
list,
function(v) data.frame(table_name = v[[1]], value = v[[3]][[1]]$f2)
)
)
which gives
table_name value
1 table1 a
2 table1 b
3 table1 c
4 table2 d
5 table2 e
6 table2 f
7 table3 x
8 table3 y
9 table3 z
Using map from purrr
library(purrr)
library(tibble)
map_dfr(list, ~ tibble(table_name = .x[[1]], value = .x[[3]][[1]]$f2))
-output
# A tibble: 9 x 2
# table_name value
# <chr> <chr>
#1 table1 a
#2 table1 b
#3 table1 c
#4 table2 d
#5 table2 e
#6 table2 f
#7 table3 x
#8 table3 y
#9 table3 z

How to combine multiple elements in different levels in a complicated list in R?

I am trying to combine elements in a list in different levels. For example, I have a list generated as below,
df1<-list(data.frame(f1= c(1:3),f2=c("a","b","c")))
df2<-list(data.frame(f1= c(4:6),f2=c("d","e","f")))
df3<-list(data.frame(f1= c(7:9),f2=c("x","y","z")))
list1 <- tibble("table1","comment1",df1)
list2 <- tibble("table2","comment2",df2)
list3 <- tibble("table3","comment3",df3)
list <- list(list1,list2,list3)
I want to combine the elements in the list and get a tibble like this,
table_name f1 value
1 table1 1 a
2 table1 2 b
3 table1 3 c
4 table2 4 d
5 table2 5 e
6 table2 6 f
7 table3 7 x
8 table3 8 y
9 table3 9 z
We can unnest
library(dplyr)
library(tidyr)
library(purrr)
map_dfr(list, ~
.x %>%
select(1, 3) %>%
rename_all(~ c('table_name', 'col1')) %>%
unnest(c(col1)))
-output
# A tibble: 9 x 3
# table_name f1 f2
# <chr> <int> <chr>
#1 table1 1 a
#2 table1 2 b
#3 table1 3 c
#4 table2 4 d
#5 table2 5 e
#6 table2 6 f
#7 table3 7 x
#8 table3 8 y
#9 table3 9 z

Summing up selected row duplicates

I have data that looks like this.
I wish to sum up the value column for rows that have the same name, time, and site. In this case, rows 3 and 4 would be summed, and rows 5 and 7 would be summed up.
I wish for the resulting data frame to look like this.
example data:
name = c('a', 'a', 'b' , 'b', 'c', 'c', 'c', 'd')
time = c(1,2,1,1,3,3,3,4)
site = c('A', 'A', 'A', 'A','B', 'D','B', 'E')
value = c(5,8,1,0,7,0,8,10)
mock = data.frame(name, time,site,value)
I really like the data.table way to do this :
library(data.table)
data[, .(value = sum(value)), by = list(name, time, site)]
name time site value
1: a 1 A 5
2: a 2 A 8
3: b 1 A 1
4: c 3 B 15
5: c 3 D 0
6: d 4 E 10
The nice thing with data.table is that the order of your rows in the first column isn't change while aggregate() change it.
Here's a tidyverse answer:
mock <- mock %>%
group_by(name, time, site) %>%
summarize(value = sum(value))
name time site value
<fct> <dbl> <fct> <dbl>
1 a 1 A 5
2 a 2 A 8
3 b 1 A 1
4 c 3 B 15
5 c 3 D 0
6 d 4 E 10
You can use base R aggregate to make it, i.e.,
> aggregate(value~.,mock,sum)
name time site value
1 a 1 A 5
2 b 1 A 1
3 a 2 A 8
4 c 3 B 15
5 c 3 D 0
6 d 4 E 10

Permutations with data.table join

Below is a table of products customer 1 has bought.
df <- data.table(customer_id = rep(1,3)
, product_1 = letters[1:3]
)
customer_id product_1
1: 1 a
2: 1 b
3: 1 c
Assume the real dataset has multiple customers, I'd like to, for each customer, create a permutation of products each has bought (without replacement). In combinatorics term:
nPk
where
n = number of (distinct) products each customer has bought
k = 2
Results:
customer_id product_1 product_2
1: 1 a b
2: 1 a c
3: 1 b c
4: 1 b a
5: 1 c a
6: 1 c b
The SQL join conditions would be:
where customer_id = customer_id
and product_1 != product_1
However, I understand data.table currently has limited support for non equi joins. Therefore, is there an alternative way of achieving this?
You can eliminate the cases where product_1 and product_2 are equal after joining
df[df, on = .(customer_id = customer_id), allow.cartesian = T
][product_1 != i.product_1
][order(product_1)]
customer_id product_1 i.product_1
1: 1 a b
2: 1 a c
3: 1 b a
4: 1 b c
5: 1 c a
6: 1 c b
Another option using by=.EACHI:
df[df, on=.(customer_id),
.(p1=i.product_1, p2=x.product_1[x.product_1!=i.product_1]), by=.EACHI]
output:
customer_id p1 p2
1: 1 a b
2: 1 a c
3: 1 b a
4: 1 b c
5: 1 c a
6: 1 c b
Using same logic as #Humpelstielzchen, in dplyr we can use full_join
library(dplyr)
full_join(df, df, by = "customer_id") %>% filter(product_1.x != product_1.y)
# customer_id product_1.x product_1.y
#1 1 a b
#2 1 a c
#3 1 b a
#4 1 b c
#5 1 c a
#6 1 c b

Remove row if it is same as previous row, with exception of one column

I have the following dataframe
x <- data.frame(id = c(1:6),
a = c('a', 'b', 'b', 'a', 'a', 'c'),
b = rep(2, 6),
c = c(5, 4, 4, 5, 5, 2))
> x
id a b c
1 1 a 2 5
2 2 b 2 4
3 3 b 2 4
4 4 a 2 5
5 5 a 2 5
6 6 c 2 2
I want to end up with
id a b c
1 1 a 2 5
2 2 b 2 4
4 4 a 2 5
6 6 c 2 2
Requirement is that I want to remove the row if it is the same as the previous row, with the exception of the column id. If it is the same as a column further up the column but not immediately previous I do not want to get rid of it. For example id4 is the same as id1 but not removed, as it is not immediately above it.
Any help would be appreciated
We can use base R
x[!c(FALSE, !rowSums(x[-1, -1] != x[-nrow(x), -1])),]
# id a b c
#1 1 a 2 5
#2 2 b 2 4
#4 4 a 2 5
#6 6 c 2 2
Here is a way using lag function in dplyr. The idea is creating a key column and check whether it's the same as previous one.
library(dplyr)
x %>%
mutate(key=paste(a, b, c, sep="|")) %>%
filter(key != lag(key, default="0")) %>%
select(-key)

Resources