I would like to query a database like this:
table person
id name age weight
1 teste1 18 101
1 teste2 18 102
1 teste3 18 103
1 teste4 18 104
1 teste5 18 105
1 teste6 18 106
2 teste7 18 91
2 teste8 18 92
2 teste9 18 93
2 teste9 18 94
2 teste1 18 95
2 teste2 18 96
3 teste3 18 87
3 teste3 18 88
3 teste3 18 89
3 teste3 18 81
3 teste3 18 82
3 teste3 18 83
3 teste3 18 84
3 teste3 18 85
and the result should be the 3 highest weight of each id, like this:
id name age weight
1 teste4 18 106
1 teste5 18 105
1 teste6 18 104
2 teste9 18 96
2 teste1 18 95
2 teste2 18 94
3 teste3 18 89
3 teste3 18 88
3 teste3 18 87
can someone help me? Best regards
With ROW_NUMBER() window function:
select t.id, t.name, t.age, t.weight
from (
select *, row_number() over (partition by id order by weight desc) rn
from tablename
) t
where t.rn <= 3
order by t.id, t.weight desc
See the demo.
Without window functions you can use a correlated subquery in the WHERE clause:
select t.id, t.name, t.age, t.weight
from tablename t
where (select count(*) from tablename where id = t.id and weight >= t.weight) <= 3
order by t.id, t.weight desc;
See the demo.
Results:
| id | name | age | weight |
| --- | ------ | --- | ------ |
| 1 | teste6 | 18 | 106 |
| 1 | teste5 | 18 | 105 |
| 1 | teste4 | 18 | 104 |
| 2 | teste2 | 18 | 96 |
| 2 | teste1 | 18 | 95 |
| 2 | teste9 | 18 | 94 |
| 3 | teste3 | 18 | 89 |
| 3 | teste3 | 18 | 88 |
| 3 | teste3 | 18 | 87 |
Related
I want to select a certain amount of rows randomly while the first and last samples are always selected.
Suppose I have a row of numbers df as
| A | B |
| -------- | -------------- |
| 1 | 10 |
| 2 | 158 |
| 3 | 106 |
| 4 | 155 |
| 5 | 130 |
| 6 | 154 |
| 7 | 160 |
| 8 | 157 |
| 9 | 140 |
| 10 | 158 |
| 11 | 210 |
| 12 | 157 |
| 13 | 140 |
| 14 | 156 |
| 15 | 160 |
| 16 | 135 |
| 17 | 102 |
| 18 | 150 |
| 19 | 120 |
| 20 | 12 |
From the table, I want to randomly select 5 rows. While selecting 5 rows I want the row 1 and row 20 to be always selected, while the rest of 3 rows can be anything else.
Right now I'm doing the following thing, but don't know if there is a way to do it in the way I want.
n <- 5
shuffled= df[sample(1:nrow(df)), ] #shuffles the entire dataframe
extracted <- shuffled[1:n, ] #extracts top 5 rows from the shuffled sample
I need to do this because I will further analyze the results.
library(dplyr)
shuffle <- function(df, size, fixed = integer()) {
i <- 1:nrow(df)
ii <- i[-fixed]
df[c(fixed, sample(ii, size)), ]
}
shuffle(starwars, 3, fixed = c(1, 2))
#> # A tibble: 5 × 14
#> name height mass hair_…¹ skin_…² eye_c…³ birth…⁴ sex gender homew…⁵
#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> <chr>
#> 1 Luke Skywal… 172 77 blond fair blue 19 male mascu… Tatooi…
#> 2 C-3PO 167 75 <NA> gold yellow 112 none mascu… Tatooi…
#> 3 Luminara Un… 170 56.2 black yellow blue 58 fema… femin… Mirial
#> 4 R4-P17 96 NA none silver… red, b… NA none femin… <NA>
#> 5 Mon Mothma 150 NA auburn fair blue 48 fema… femin… Chandr…
#> # … with 4 more variables: species <chr>, films <list>, vehicles <list>,
#> # starships <list>, and abbreviated variable names ¹hair_color, ²skin_color,
#> # ³eye_color, ⁴birth_year, ⁵homeworld
#> # ℹ Use `colnames()` to see all variable names
My Dataframe contains some Options data with several variable. The relevant ones are: date, days (which means days to expiration) and mid. Mid needs to be interpolated via natural spline for a future timepoint in 30 days. This interpolation should be done every day for every strike price.
The data table looks as follows:
| date | days | mid |strike|
|------|------|-----|------|
| 2020- 01 - 01 | 8 | 12 | 110 |
| 2020- 01 - 01 | 28 | 14 | 110 |
| 2020- 01 - 01 | 49 | 15 | 110 |
| 2020- 01 - 01 | 80| 17 | 110 |
| 2020- 01 - 01 | 8 | 11 | 120 |
| 2020- 01 - 01 | 28 | 12 | 120 |
| 2020- 01 - 01 | 49 | 13 | 120 |
| 2020- 01 - 01 | 80| 14 | 120 |
| 2020- 01 - 12 | 6 | 12 | 110 |
| 2020- 01 - 12 | 26 | 14 | 110 |
| 2020- 01 - 12 | 47 | 15 | 110 |
| 2020- 01 - 12 | 82| 17 | 110 |
| 2020- 01 - 12 | 7 | 11 | 120 |
| 2020- 01 - 12 | 27 | 12 | 120 |
| 2020- 01 - 12 | 47 | 13 | 120 |
| 2020- 01 - 12 | 85| 14 | 120 |
This is just an example. The original data frame contains over 1 million entries. For this I can't use a for loop and want to interpolate by group.
I found some approaches online, unfortunately none of them really worked for me.
My last guess was:
df$id <- paste0(df$date, df$strike)
df %>%
group_by(id) %>%
mutate(mid_30 = splime(df$days, df$mid, xout = 30 , method = "natural" ))
Do you have any possible solution?
Thank you very much in advance!
i am new to sqlite, trying to solve how to only rank 3 highest value for each days
Table : price
Date | Name | value1 |
21-08-2018 | A | 100
21-08-2018 | B | 90
21-08-2018 | C | 80
21-08-2018 | D | 70
21-08-2018 | E | 60
21-08-2018 | F | 50
22-08-2018 | B | 99
22-08-2018 | A | 88
22-08-2018 | D | 77
22-08-2018 | C | 66
22-08-2018 | E | 55
22-08-2018 | F | 44
23-08-2018 | D | 90
23-08-2018 | A | 80
23-08-2018 | B | 70
23-08-2018 | C | 80
23-08-2018 | F | 70
23-08-2018 | E | 60
i'am expecting the result like below
Date | Name | value1 |
21-08-2018 | A | 100
21-08-2018 | B | 90
21-08-2018 | C | 80
22-08-2018 | B | 99
22-08-2018 | A | 88
22-08-2018 | D | 77
23-08-2018 | D | 90
23-08-2018 | A | 80
23-08-2018 | B | 70
i've tried
select *, max (value1)
from price
group by (date)
but only show one line, already tried using TOP3, but only show 3 from lots of row,
thanks
If you're using Sqlite 3.25 or newer, window functions make it easy:
WITH ranked AS (SELECT date, name, value1
, rank() OVER (PARTITION BY date ORDER BY value1 DESC) AS rank
FROM price)
SELECT date, name, value1
FROM ranked
WHERE rank <= 3
ORDER BY date, rank;
gives
date name value1
---------- ---------- ----------
21-08-2018 A 100
21-08-2018 B 90
21-08-2018 C 80
22-08-2018 B 99
22-08-2018 A 88
22-08-2018 D 77
23-08-2018 D 90
23-08-2018 A 80
23-08-2018 C 80
This will potentially return more than three rows per date in case of duplicate value1 figures causing ties - use row_number() instead of rank() for only ever at most three.
I am new to R. Your help here will be appreciated.
I have inputs such as.
columnA <- 14 # USERINPUT
columnB <- 1 # Incremented from 1.2.3.etc
columnC <- columnA * columnB
columnD <- 25 # remains constant
columnE <- columnC / columnD
columnF <- 8 # remains constant
columnG <- columnE + columnF
mydf <- data.frame(columnA,columnB,columnC,columnD,columnE,columnF,columnG)
Based on the above data frame I need to create a data frame such that in every susbsequent row value at columnB is incremented from 1 to 2 to 3 such that the value at columnG is never above 600 and we stop creating rows. I tried to do this in excel.Below is kind of the output i would need.
+---------+--------+---------+---------+---------+---------+---------+
| columnA | columB | columnC | columnD | columnE | columnF | columnG |
+---------+--------+---------+---------+---------+---------+---------+
| 14 | 1 | 14 | 25 | 0.56 | 8 | 8.56 |
| 14 | 2 | 28 | 25 | 1.12 | 8 | 9.12 |
| 14 | 3 | 42 | 25 | 1.68 | 8 | 9.68 |
| 14 | 4 | 56 | 25 | 2.24 | 8 | 10.24 |
| 14 | 5 | 70 | 25 | 2.8 | 8 | 10.8 |
| 14 | 6 | 84 | 25 | 3.36 | 8 | 11.36 |
| 14 | 7 | 98 | 25 | 3.92 | 8 | 11.92 |
| 14 | 8 | 112 | 25 | 4.48 | 8 | 12.48 |
+---------+--------+---------+---------+---------+---------+---------+
The end result should be a data frame
First you can compute the lenght of the data.frame:
userinput <- 14
N <- (600 - 8) * 25 / userinput
Then, using dplyr you create the data.frame:
mydf <- data_frame(ColA = 14, ColB = 1:floor(N), ColD = 25, ColF = 8) %>%
mutate(ColC = ColA * ColB, ColE = ColC/ColD, ColG = ColE + ColF)
If you need the columns in the correct order:
> mydf <- mydf %>% select(ColA, ColB, ColC, ColD, ColE, ColF, ColG)
> mydf
ColA ColB ColD ColF ColC ColE ColG
1: 14 1 25 8 14 0.56 8.56
2: 14 2 25 8 28 1.12 9.12
3: 14 3 25 8 42 1.68 9.68
4: 14 4 25 8 56 2.24 10.24
5: 14 5 25 8 70 2.80 10.80
---
1053: 14 1053 25 8 14742 589.68 597.68
1054: 14 1054 25 8 14756 590.24 598.24
1055: 14 1055 25 8 14770 590.80 598.80
1056: 14 1056 25 8 14784 591.36 599.36
1057: 14 1057 25 8 14798 591.92 599.92
I have a dataframe like this
ID <- c(101,101,101,102,102,102,103,103,103)
Pt_A <- c(50,100,150,20,30,40,60,80,90)
df <- data.frame(ID,Pt_A)
+-----+------+
| ID | Pt_A |
+-----+------+
| 101 | 50 |
| 101 | 100 |
| 101 | 150 |
| 102 | 20 |
| 102 | 30 |
| 102 | 40 |
| 103 | 60 |
| 103 | 80 |
| 103 | 90 |
+-----+------+
I want to create 2 new columns with values calculated from Pt_A column.
df$Del_Pt_A <- NthRow(Pt_A) - 1stRow(Pt_A) grouped by ID, where n = 1,2,...n
df$Perc_Pt_A <- NthRow(Del_Pt_A) / 1stRow(Pt_A) grouped by ID, where n = 1,2,...n
Here is my desired output
+-----+------+---------+-----------+
| ID | Pt_A | Del_Pt_A | Perc_Pt_A|
+-----+------+---------+-----------+
| 101 | 50 | 0 | 0 |
| 101 | 100 | 50 | 1.0 |
| 101 | 150 | 100 | 2.0 |
| 102 | 20 | 0 | 0 |
| 102 | 30 | 10 | 0.5 |
| 102 | 40 | 20 | 1.0 |
| 103 | 60 | 0 | 0 |
| 103 | 80 | 20 | 0.3 |
| 103 | 90 | 30 | 0.5 |
+-----+------+---------+-----------+
I currently get the desired result in MS Excel but I want to learn to do it in R to make my work efficient. I came across packages like dplyr, plyr, data.table etc but I couldn't solve it using those. Could some one please help me figure out how to work around this.
Here's a data.table way:
library(data.table)
setDT(df)[,`:=`(
del = Pt_A - Pt_A[1],
perc = Pt_A/Pt_A[1]-1
),by=ID]
which gives
ID Pt_A del perc
1: 101 50 0 0.0000000
2: 101 100 50 1.0000000
3: 101 150 100 2.0000000
4: 102 20 0 0.0000000
5: 102 30 10 0.5000000
6: 102 40 20 1.0000000
7: 103 60 0 0.0000000
8: 103 80 20 0.3333333
9: 103 90 30 0.5000000
Here another option in base R:
cbind(df,
do.call(rbind,by(df,df$ID,
function(x)
setNames(data.frame(x$Pt_A-x$Pt_A[1],
x$Pt_A/x$Pt_A[1]-1),
c('Del_Pt_A','Perc_Pt_A')))))
# ID Pt_A Del_Pt_A Perc_Pt_A
# 101.1 101 50 0 0.0000000
# 101.2 101 100 50 1.0000000
# 101.3 101 150 100 2.0000000
# 102.1 102 20 0 0.0000000
# 102.2 102 30 10 0.5000000
# 102.3 102 40 20 1.0000000
# 103.1 103 60 0 0.0000000
# 103.2 103 80 20 0.3333333
# 103.3 103 90 30 0.5000000
I am using :
by to apply a function by group, the result is a list
do.call(rbind, list_by) to transform the list to a data.frame
cbind to add the result to the initial data.frame