A teradata table
as
Group_categ id
A 1
A 2
A 3
A 5
A 8
A 9
B 11
C 1
C 2
C 3
C 4
need to filter it like
Group_categ min_id max _id
A 1 2
A 3 5
A 8 9
B 11 11
C 1 2
C 3 4
Seems you want to combine two consecutive rows into a single row:
SELECT Group_categ, Min(id), Max(id)
FROM
(
SELECT
Group_categ, id,
-- assign the same value to two consecutive rows: 0,0,1,1,2,2,..
-- -> used in the outer GROUP BY
(Row_Number() Over (PARTITION BY Group_categ ORDER BY id)-1) / 2 AS grp
FROM mytab
) AS dt
GROUP BY Group_categ, grp
Related
I have a flagging rule need to apply.
Here is how my dataset looks like:
df <- data.frame(id = c(1,1,1,1, 2,2,2,2, 3,3,3,3),
key = c("a","a","b","c", "a","b","c","d", "a","b","c","c"),
form = c("A","B","A","A", "A","A","A","A", "B","B","B","A"))
> df
id key form
1 1 a A
2 1 a B
3 1 b A
4 1 c A
5 2 a A
6 2 b A
7 2 c A
8 2 d A
9 3 a B
10 3 b B
11 3 c B
12 3 c A
I would like to flag ids based on a key columns that has duplicates, a third column of form shows different forms for each key. The idea is to understand if an id has taken any items from multiple forms. I need to add a filtering column as below:
> df.1
id key form type
1 1 a A multiple
2 1 a B multiple
3 1 b A multiple
4 1 c A multiple
5 2 a A single
6 2 b A single
7 2 c A single
8 2 d A single
9 3 a B multiple
10 3 b B multiple
11 3 c B multiple
12 3 c A multiple
And eventually I need to get rid off the extra duplicated row which has different form. To decide which of the duplicated one drops, I pick whichever the form type has more items.
In a final separate dataset, I would like to have something like below:
> df.2
id key form type
1 1 a A multiple
3 1 b A multiple
4 1 c A multiple
5 2 a A single
6 2 b A single
7 2 c A single
8 2 d A single
9 3 a B multiple
10 3 b B multiple
11 3 c B multiple
So first id has form A dominant so kept the A, and the third id has form B dominant so kept the B.
Any ideas?
Thanks!
We can check number of distinct elements to create the new column by group and then filter based on the highest frequency (Mode)
library(dplyr)
df.2 <- df %>%
group_by(id) %>%
mutate(type = if(n_distinct(form) > 1) 'multiple' else 'single') %>%
filter(form == Mode(form)) %>%
ungroup
-output
> df.2
# A tibble: 10 × 4
id key form type
<dbl> <chr> <chr> <chr>
1 1 a A multiple
2 1 b A multiple
3 1 c A multiple
4 2 a A single
5 2 b A single
6 2 c A single
7 2 d A single
8 3 a B multiple
9 3 b B multiple
10 3 c B multiple
where
Mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
I have to retrieve IDs for employees who have completed the minimum number of jobs. There are multiple employees who have completed 1 job. My current sqldf query retrieves only 1 row of data, while there are multiple employee IDs who have completed just 1 job. Why does it stop at the first minimum value? And how do I fetch all rows with the minimum value in a column? Here is a data sample:
ID TaskCOunt
1 74
2 53
3 10
4 5
5 1
6 1
7 1
The code I have used:
sqldf("select id, min(taskcount) as Jobscompleted
from (select id,count(id) as taskcount
from MyData
where id is not null
group by id order by id)")
Output is
ID leastcount
5 1
While what I want is all the rows with minimum jobs completed.
ID Jobscompleted
5 1
6 1
7 1
min(...) always returns one row in SQL as do all SQL aggregate functions. Try this instead:
sqldf("select ID, TaskCount TasksCompleted from MyData
where TaskCount = (select min(TaskCount) from MyData)")
giving:
ID TasksCompleted
1 5 1
2 6 1
3 7 1
Note: The input in reproducible form is:
Lines <- "
ID TaskCount
1 74
2 53
3 10
4 5
5 1
6 1
7 1"
MyData <- read.table(text = Lines, header = TRUE)
As an alternative to sqldf, you could use data.table:
library(data.table)
dt <- data.table(ID=1:7, TaskCount=c(74, 53, 10, 5, 1, 1, 1))
dt[TaskCount==min(TaskCount)]
## ID TaskCount
## 1: 5 1
## 2: 6 1
## 3: 7 1
I have a data.frame that looks like the one above. I need to replace the values in the first columns based on the values on second column but the replacement need to continue the numeric value of column 1, and only replacing the values in column 1 when !ValB==A
>df1
ValA ValB
1 A
1 A
2 A
2 A
3 A
3 A
4 A
4 A
1 B
1 B
1 B
2 B
2 B
3 B
4 B
4 B
1 C
1 C
2 C
2 C
3 C
3 C
4 C
1 C
What I want is replace the values in column1 but using ValB==B as the index for replacing the values in ValA. The replacement has to continue the values in ValA, i.e, when there is a 1 and the ValB==B the ValA has to be 5, the 2 has to be 6 and so on. Please here is the desired output, what will make easier to understand what I am doing. I could do a for loop with if and elseif statement but I am sure that there is a cleaner way,
Desired output
>df1
ValA ValB
1 A
1 A
2 A
2 A
3 A
3 A
4 A
4 A
5 B
5 B
5 B
6 B
6 B
6 B
7 B
7 B
8 C
8 C
9 C
9 C
10 C
10 C
11 C
12 C
You could do something like this. It basically runs a cumulative sum over a boolean vector which tells you whether ValA and ValB of one row are equal to the one of the previous row -
# do a running sum of the values
df$c = cumsum(
c(
# first value of the result is the same value as the first value of A
df$ValA[1],
# go through the second to the last value of the vector and compared it to the first to the n - 1th values
sapply(
2:nrow(df),
function(index) {
# look for change in value of A and B both
# if changed then return 1, else return 0
!(
df$ValA[index] == df$ValA[index - 1] &
df$ValB[index] == df$ValB[index - 1]
)
}
)
))
I would like to delete old data from database table. I would just like to keep last 2 records per id. For example I have a table with following records.
ID TIME DATA
1 2 3
1 3 4
1 4 5
2 2 3
2 3 4
2 4 5
2 5 6
Result which I would like to make is (it must be sorted by TIME):
ID TIME DATA
1 3 4
1 4 5
2 4 5
2 5 6
Thank you for your help.
A solution could be:
select * from tab where (
select count(*) from tab as t
where t.ID = tab.ID and t.TIME >= tab.TIME
) <= 2;
for more details visit:
http://www.xaprb.com/blog/2006/12/07/how-to-select-the-firstleastmax-row-per-group-in-sql/
I am trying to find values which are the same as at least two values above it. Please take a look.
id number
1 2
2 6
3 7
4 7
5 7
6 1
7 2
8 4
9 7
So in this case select would return:
ID NUMBER
3 7
4 7
5 7
You can look up values in othe rows with a correlated subquery:
SELECT *
FROM MyTable
WHERE number = (SELECT number
FROM MyTable AS T2
WHERE T2.id = MyTable.id - 1)
AND number = (SELECT number
FROM MyTable AS T2
WHERE T2.id = MyTable.id - 2);