I have an existing SQLite_table like this:
startdate - enddate
2018-01-01 - 2018-06-30
2018-07-01 - 2018-12-31
2019-01-01 - 2019-06-30
2019-07-01 - 2019-12-31
2020-01-01 - 2020-06-30
2020-07-01 - 2020-12-31
2021-01-01 - 2021-06-30
What is the SQL-Statement for the result;
The result should be:
2019-11-01 2020-12-31 // 60 Days difference
2020-01-01 2020-06-30 // 180 Days difference
2020-07-01 2020-12-31 // 180 Days difference
2021-01-01 2021-06-30 // 180 Days difference
'2019-11-01' is entered via the search field in my Android app as input
The point is the output of the small period at the beginning / the
insertion of the first period in the statement
I tried 'Union' and it gives me an error.
How can I do this with a query ?
I am thankful for any help
and what is the SQL statement, when it should
return data from '2019-11-01' until 'now'
2019-11-01 2020-12-31 // 60 Days difference
2020-01-01 2020-06-30 // 180 Days difference
2020-07-01 2020-12-31 // 180 Days difference
2021-01-01 2021-02-16 // 46 Days difference ! !
Thank's
You need all the rows where enddate is less or equal than '2019-11-01':
SELECT MAX(startdate, '2019-11-01') startdate, enddate
FROM tablename
WHERE enddate >= '2019-11-01'
See the demo.
Results:
startdate
enddate
2019-11-01
2019-12-31
2020-01-01
2020-06-30
2020-07-01
2020-12-31
2021-01-01
2021-06-30
Edit, for your 2nd question:
SELECT MAX(startdate, '2019-11-01') startdate,
MIN(enddate, CURRENT_DATE) enddate
FROM tablename
WHERE enddate >= '2019-11-01' AND startdate <= CURRENT_DATE
See the demo.
Results:
startdate
enddate
2019-11-01
2019-12-31
2020-01-01
2020-06-30
2020-07-01
2020-12-31
2021-01-01
2021-02-16
Related
I got a DF with a date column in it. I want to check if the date in the column is after or before 1st of January 2020. Create a new column and if the previous columns date is before then insert 1st of January 2020 if not then insert previous columns date.
Date is in format YYYY-MM-DD
Beginning End
2020-12-31 2021-01-12
2018-01-02 2020-03-10
2019-04-12 2020-12-04
2020-10-15 2021-03-27
I want:
Beginning End Beginning_2
2020-12-31 2021-01-12 2020-12-31
2018-01-02 2020-03-10 2020-01-01
2019-04-12 2020-12-04 2020-01-01
2020-10-15 2021-03-27 2020-10-15
The code i wrote is:
DF$Beginning_2 <- ifelse("2020-01-01" > DF$Beginning,"2020-01-01", DF$Beginning)
I'm getting this
Beginning End Beginning_2
2020-12-31 2021-01-12 18554
2018-01-02 2020-03-10 2020-01-01
2019-04-12 2020-12-04 2020-01-01
2020-10-15 2021-03-27 18453
My code works half way. It turns the format in to char. I need it to stay as date. I tried butting as date all over the code but nothing much changed. The biggest change was that greater then 2020-01-01 dates were NA instead of "18554".
How to fix my code?
Thank you
You can use pmax:
DF$Beginning_2 <- pmax(DF$Beginning, as.Date("2020-01-01"))
#DF$Beginning_2 <- pmax(DF$Beginning, "2020-01-01") #Works also
DF
# Beginning End Beginning_2
#1 2020-12-31 2021-01-12 2020-12-31
#2 2018-01-02 2020-03-10 2020-01-01
#3 2019-04-12 2020-12-04 2020-01-01
#4 2020-10-15 2021-03-27 2020-10-15
str(DF)
#'data.frame': 4 obs. of 3 variables:
# $ Beginning : Date, format: "2020-12-31" "2018-01-02" ...
# $ End : Date, format: "2021-01-12" "2020-03-10" ...
# $ Beginning_2: Date, format: "2020-12-31" "2020-01-01" ...
Base R ifelse would return dates as numbers you will need to convert them back to dates.
DF$Beginning_2 <- as.Date(ifelse(DF$Beginning > as.Date("2020-01-01"),
DF$Beginning, as.Date("2020-01-01")), origin = '1970-01-01')
You may use dplyr::if_else which will maintain the class of the date columns.
DF$Beginning_2 <- dplyr::if_else(DF$Beginning > as.Date("2020-01-01"),
DF$Beginning, as.Date("2020-01-01"))
DF
# Beginning End Beginning_2
#1 2020-12-31 2021-01-12 2020-12-31
#2 2018-01-02 2020-03-10 2020-01-01
#3 2019-04-12 2020-12-04 2020-01-01
#4 2020-10-15 2021-03-27 2020-10-15
The data table looks like the following:
ID DATE
1 2020-12-31 10:10:00
2 2020-12-31 20:30:00
3 2020-12-31 20:50:00
4 2021-01-02 17:10:00
5 2021-01-02 17:20:00
6 2021-01-02 17:30:00
7 2021-01-03 23:10:00
..
And I would like to query only the last entry per hour per day, and to have the resulte like:
ID DATE
1 2020-12-31 10:10:00
3 2020-12-31 20:50:00
6 2021-01-02 17:30:00
7 2021-01-03 23:10:00
..
I tried to look for hourly query and found the following
strftime('%H', " + DATE + ", '+1 hours')
However, not sure how to use it properly (e.g. with GROUP BY ? then how to ensure it takes the lastest entry of the hour), therefore, would be great to have some help here!
You can do it with ROW_NUMBER() window function:
SELECT ID, DATE
FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY strftime('%Y%m%d%H', DATE) ORDER BY DATE DESC) rn
FROM tablename
)
WHERE rn = 1
ORDER BY ID
Instead of strftime('%Y%m%d%H', DATE) you could also use substr(DATE, 1, 13).
For versions of SQLite previous to 3.25.0 which do not support window functions you can do it with NOT EXISTS:
SELECT t1.*
FROM tablename t1
WHERE NOT EXISTS (
SELECT 1
FROM tablename t2
WHERE strftime('%Y%m%d%H', t2.DATE) = strftime('%Y%m%d%H', t1.DATE)
AND t2.DATE > t1.DATE
)
See the demo.
Results:
> ID | DATE
> -: | :------------------
> 1 | 2020-12-31 10:10:00
> 3 | 2020-12-31 20:50:00
> 6 | 2021-01-02 17:30:00
> 7 | 2021-01-03 23:10:00
I'm currently building some charts of covid-related data....my script goes out and downloads most recent data and goes from there. I wind up with dataframes that look like
head(NMdata)
Date state positiveIncrease totalTestResultsIncrease
1 2020-05-19 NM 158 4367
2 2020-05-18 NM 81 4669
3 2020-05-17 NM 195 4126
4 2020-05-16 NM 159 4857
5 2020-05-15 NM 139 4590
6 2020-05-14 NM 152 4722
I've been aggregating to weekly data using the tq_transmute function from tidyquant.
NMweeklyPos <- NMdata %>% tq_transmute(select = positiveIncrease, mutate_fun = apply.weekly, FUN=sum)
This works, but it aggregates on week of the year, with weeks starting on Sunday.
head(NMweeklyPos)
Date positiveIncrease
<dttm> <int>
1 2020-03-08 00:00:00 0
2 2020-03-15 00:00:00 13
3 2020-03-22 00:00:00 44
4 2020-03-29 00:00:00 180
5 2020-04-05 00:00:00 306
6 2020-04-12 00:00:00 631
So for instance if I ran it today (which happens to be a Wednesday) my last entry is a partial week with Monday, Tuesday, Wednesday.
tail(NMweeklyPos)
Date positiveIncrease
<dttm> <int>
1 2020-04-19 00:00:00 624
2 2020-04-26 00:00:00 862
3 2020-05-03 00:00:00 1072
4 2020-05-10 00:00:00 1046
5 2020-05-17 00:00:00 1079
6 2020-05-19 00:00:00 239
For purposes of my chart this winds up being a small value, and so I have been discarding the partial weeks at the end, but that means I'm throwing out the most recent data.
I would prefer the throw out a partial week from the start of the dataset and have the aggregation automatically use weeks that end on whatever day the script is being run. So if I ran it today (Wednesday) it would aggregate on weeks ending Wednesday so that I had the most current data included...I could drop the partial week from the beginning of the data. But tomorrow it would choose weeks ending Thursday, etc. And I don't want to have to hardcode the week end day and change it each time.
How can I go about achieving that?
Using lubridate, the below code will find what day of the week it is and define that day as the floor for each week.
Hope this helps!
library(lubridate)
library(dplyr)
end = as.Date("2020-04-14")
data = data.frame(
date = seq.Date(as.Date("2020-01-01"), end, by = "day"),
val = 1
)
# get the day of the week
weekday = wday(end)
# using the floor_date function we can use todays date to determine what day of the week will be our floor
data%>%
mutate(week = floor_date(date, "week", week_start = weekday))%>%
group_by(week)%>%
summarise(total = sum(val))
I have a SQLite database, I want to create a query that will group records if the DateTime is within 60 minutes - the hard part is the DateTime is cumulative so if we have 3 records with DateTimes 2019-12-14 15:40:00, 2019-12-14 15:56:00 and 2019-12-14 16:55:00 it would all fall in one group. Please see the hands and desired output of the query to help you understand the requirement.
Database Table "Hands"
ID DateTime Result
1 2019-12-14 15:40:00 -100
2 2019-12-14 15:56:00 1000
3 2019-12-14 16:55:00 -2000
4 2012-01-12 12:00:00 400
5 2016-10-01 21:00:00 900
6 2016-10-01 20:55:00 1000
Desired output of query
StartTime Count Result
2019-12-14 15:40:00 3 -1100
2012-01-12 12:00:00 1 400
2016-10-01 20:55:00 2 1900
You can use some window functions to indicate at which record a new group should start (because of a datetime difference with the previous that is 60 minutes or larger), and then to turn that information into a unique group number. Finally you can group by that group number and perform the aggregation functions on it:
with base as (
select DateTime, Result,
coalesce(cast((
julianday(DateTime) - julianday(
lag(DateTime) over (order by DateTime)
)
) * 24 >= 1 as integer), 1) as firstInGroup
from Hands
), step as (
select DateTime, Result,
sum(firstInGroup) over (
order by DateTime rows
between unbounded preceding and current row) as grp
from base
)
select min(DateTime) DateTime,
count(*) Count,
sum(Result) Result
from step
group by grp;
DB-fiddle
I would like a help ... the clinic has several doctors and each one has a specific time of care. Example: 07:00 to 12:00, 12:00 to 17:00, 09:00 to 15:00 ... What is the SQL statement to display only records within the specified time range in the start_time and end_time ?
fields:
start_time | end_time
07:00:00 | 12:30:00
09:00:00 | 15:00:00
12:30:00 | 17:00:00
07:00:00 | 17:00:00
That is, in the morning, display only the records that are part of 07:00:00 to 12:30:00 from the current time. If it's afternoon show only record that are part of 12:30:00 until 17:00:00.
Thankful.