sqlite multiple query conditions - sqlite

I've searched but can't find the right answer, and I'm going round in circles.
I have
CREATE TABLE History (yr Int, output Int, cat Text);
yr output cat
---------- ---------- ----------
2015 10 a
2016 20 a
2017 30 a
2018 50 a
2019 70 a
2015 100 b
2016 200 b
2017 300 b
2018 500 b
2019 700 b
2015 1000 c
2016 2000 c
2017 3000 c
2018 5000 c
2019 7000 c
2015 10000 d
2016 20000 d
2017 30000 d
2018 50000 d
2019 70000 d
I've created two views
CREATE VIEW Core AS select * from History where cat = "c" or cat = "d";
CREATE VIEW Plus AS select * from History where cat = "a" or cat = "b";
My query is
select distinct yr, sum(output), (select sum(output) from core group by yr) as _core, (select sum(output) from plus group by yr) as _plus from history group by yr;
yr sum(output) _core _plus
---------- ----------- ---------- ----------
2015 11110 11000 110
2016 22220 11000 110
2017 33330 11000 110
2018 55550 11000 110
2019 77770 11000 110
Each of the individual queries works but _core and _plus columns are wrong when it's all put together. How should I approach this please.

You may generate your expected output without a view, using a single query with conditional aggregation:
SELECT
yr,
SUM(output) AS sum_output,
SUM(CASE WHEN cat IN ('c', 'd') THEN output ELSE 0 END) AS _core,
SUM(CASE WHEN cat IN ('a', 'b') THEN output ELSE 0 END) AS _plus
FROM History
GROUP BY
yr;
If you really wanted to make your current approach work, one way would be to just join the two views by year. But that would leave open the possibility that each view might not have every year present.

Related

How to execute a left join in R?

Below is the sample data and one manipulation. The first data set is employment specific to an industry. The second data set is overall employment and unemployment rate. I am seeking to do a left join (or at least that's what I think it should be) to achieve the desired result below. When I do it, I get a one to many issue with the row count growing. In this example, it goes from 14 to 18. In the larger data set, it goes from 228 to 4348. Primary question is if this can be done with only a properly written join script or is there more to it?
area1<-c(000000,000000,000000,000000,000000,000000,000000,000000,000000,000000,000000,000000,000000,000000)
periodyear<-c(2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2021,2021)
month<-c(1,2,3,4,5,6,7,8,9,10,11,12,1,2)
emp1 <-c(10,11,12,13,14,15,16,17,20,21,22,24,26,28)
firstset<-data.frame(area1,periodyear,month,emp1)
area1<-c(000000,000000,000000,000000,000000,000000,000000,000000,000000,000000,000000,000000,000000,000000)
periodyear1<-c(2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2021,2021)
period<-c(01,02,03,04,05,06,07,08,09,10,11,12,01,02)
rate<-c(3.0,3.2,3.4,3.8,2.5,4.5,6.5,9.1,10.6,5.5,7.8,6.5,4.5,2.9)
emp2<-c(1001,1002,1005,1105,1254,1025,1078,1106,1099,1188,1254,1250,1301,1188)
secondset<-data.frame(area2,periodyear1,period,rate,emp2)
secondset <- secondset%>%mutate(month = as.numeric(period))
secondset <- left_join(firstset,secondset, by=c("month"))
Desired Result (14 rows with below being the first 3)
area1 periodyear month emp1 rate emp2
000000 2020 1 10 3.0 1001
000000 2020 2 11 3.2 1002
000000 2020 3 12 3.4 1005
We may have to add 'periodyear' as well in the by
library(dplyr)
left_join(firstset,secondset, by=c("periodyear" = "periodyear1",
"area1" = "area2", "month"))
-output
area1 periodyear month emp1 period rate emp2
1 0 2020 1 10 1 3.0 1001
2 0 2020 2 11 2 3.2 1002
3 0 2020 3 12 3 3.4 1005
...

Resampling in nested groups in R

I have run across similar question, but have not been able to find an answer for my specific needs.
I have a data set with a nested group design and I need to randomly sample (with replacement) within each group and the number of resampling events must equal the number of samples (i.e., rows) per group. Additionally, the nested groups have multiple columns of data. See the example df below.
I have code using the dplyr package, but am moving away from dplyr as I have to continuously update my code as dplyr changes function names and operations...which is annoying to say the least. Yes...I know there are several ways to circumvent this issue, but have decided it is time to cast aside the dplyr crutches and learn how to execute data wrangling using R base package.
Working dplyr code:
Resample_function = function(Boot)
{group_by(data1, GROUP, YEAR) %>%
slice(sample(n(), replace = TRUE))%>%
ungroup()
}
I have tried to use various combinations of aggregate, ave, and the apply family of functions...but my ability to deal with nested group designs in base package is limited to say the least.
Below I have provided an example data set (df) and what the results should look like. Note that the resampling produce will produce different results, but the number of resamples per nested group should be the same.
One final request...I am open to all options (e.g., library(data.table), library(boot), etc) as it would be great if others find this post useful. Additionally, some of these packages can be more efficient than base package. However, I prefer solutions that do not require the installation and loading of additional packages.
Thanks in advance for you help.
Take care.
df <- read.table(text = "GROUP YEAR VAR1 VAR2
a 2018 1.0 1.0
a 2018 2.0 2.0
b 2018 10 10
b 2018 20 20
b 2018 30 30
b 2018 40 40
b 2019 50 50
b 2019 60 60
b 2019 70 70
b 2019 80 80
b 2019 90 90
b 2019 100 100
b 2019 110 110
b 2019 120 120
b 2019 130 130
b 2019 140 140
b 2019 150 150
b 2019 160 160
b 2019 170 170
b 2019 180 180
b 2020 190 190
b 2020 200 200
b 2020 210 210", header = TRUE)
result <- read.table(text = "GROUP YEAR VAR1 VAR2
a 2018 1 1
a 2018 1 1
b 2018 20 20
b 2018 30 30
b 2018 30 30
b 2018 20 20
b 2019 70 70
b 2019 170 170
b 2019 50 50
b 2019 150 150
b 2019 70 70
b 2019 150 150
b 2019 100 100
b 2019 120 120
b 2019 50 50
b 2019 160 160
b 2019 90 90
b 2019 150 150
b 2019 170 170
b 2019 180 180
b 2020 190 190
b 2020 190 190
b 2020 190 190", header = TRUE)
You can perform this kind of shuffling in base R using ave :
Resample_function <- function(data) {
new_data <- data[with(data, ave(seq(nrow(data)), GROUP, YEAR,
FUN = function(x) sample(x, replace = TRUE))), ]
rownames(new_data) <- NULL
return(new_data)
}
Resample_function(df)

Converting raw data into long format

I am importing data that is neither long nor wide:
clear
input str1 id purchased sold
A 2017 .
B . .
C 2016 2019
C 2018 .
D 2018 2019
D 2018 .
end
My goal is to get the data in the following long format, reflecting the count in each year:
Identifier Year Inventory
A 2016 0
A 2017 1
A 2018 1
A 2019 1
B 2016 0
B 2017 0
B 2018 0
B 2019 0
C 2016 1
C 2017 1
C 2018 2
C 2019 1
D 2016 0
D 2017 0
D 2018 2
D 2019 1
My initial approach would be to transform it first into a wide format, that is having only one row per identifier, and adding columns for years between 2016-2018. And then converting this format into the desired long format. However, this seems to be inefficient.
Is there any shorter and more efficient method to do this, as I have a much larger dataset?
This needs several small tricks. The most crucial are reshape long and fillin.
The inventory is essentially a running sum of purchases minus sales.
clear
input str1 Identifier Purchased Sold
A 2017 .
B . .
C 2016 2019
C 2018 .
D 2018 2019
D 2018 .
end
generate long id = _n
rename (Purchased Sold) year=
reshape long year, i(id) j(Event) string
drop id
fillin Id year
drop _fillin
drop if missing(year)
bysort Id (year Event) : generate inventory = sum((Event == "Purchased") - (Event == "Sold"))
drop Event
bysort Id year : keep if _n == _N
list, sepby(Id)
+----------------------------+
| Identi~r year invent~y |
|----------------------------|
1. | A 2016 0 |
2. | A 2017 1 |
3. | A 2018 1 |
4. | A 2019 1 |
|----------------------------|
5. | B 2016 0 |
6. | B 2017 0 |
7. | B 2018 0 |
8. | B 2019 0 |
|----------------------------|
9. | C 2016 1 |
10. | C 2017 1 |
11. | C 2018 2 |
12. | C 2019 1 |
|----------------------------|
13. | D 2016 0 |
14. | D 2017 0 |
15. | D 2018 2 |
16. | D 2019 1 |
+----------------------------+

sqlite self join with where clause

I have a table that consists of names, points, and years. I need a command to return all the names for a specific year even if the name isn't included in that year. Example
Name Points Year
------- -------
tom 8 2011
jim 45 2011
jerry 25 2011
zack 124 2011
jeff 45 2011
tom 62 2012
jim 214 2012
jerry 13 2012
zack 32 2012
arnold 4 2012
Name Points Year
------- -------
tom 8 2011
jim 45 2011
jerry 25 2011
zack 124 2011
jeff 45 2011
arnold NULL NULL
I figured this would be easy but I am struggling to make it work.
From your explanation, I'm thinking you need could use something like this:
SELECT DISTINCT
N.`Name`,
D.`Points`,
Y.`Year`
FROM
`MyData` Y
LEFT JOIN (SELECT DISTINCT `Name` FROM `MyData`) N ON 1=1
LEFT JOIN `MyData` D
ON D.`Year` = Y.`Year`
AND D.`Name` = N.`Name`
ORDER BY
Y.`Year`
While It's not pretty, It does seem to work as intended:

Query formatting

I have table with below rows
Name Month Salary Expense
John Jan 1000 50
John Feb 5000 2000
Jack Jan 3000 100
I want to get output in the below format. How to achieve this.
Name JAN FEB
John 1000 50 5000 2000
Jack 3000 100 0 0
This sql(-server) query would work:
select name,
isnull(max(case when month='jan' then salary end), 0) as Salary_jan,
isnull(max(case when month='feb' then salary end), 0) as Salary_feb
-- and so on
group by name

Resources