SQL value not available anymore - sqlite

I'm using SQLite for a query as you see below. Therefore I need the average of the all the interactions. Including the "thankful" interaction as well, I don't get results (n/a) because it doesn't exist anymore since October 2022. What do I have to include into the code, that I don't get this error?
SELECT
SUBSTR(facebookOwnPosts.time, 1, 7) AS month,
ROUND (
AVG (
likes + love + wow + haha + sad + angry + comments + shares + thankful
)
) AS enagegementperpost
FROM
facebookOwnPosts
GROUP BY
month
I believed that if there is no data for one of the parameters it would just leave it out.

Related

What is non-trivial way of finding solution in systems of linear equations?

Hi!
I think I understood till (2.40) but i don't seem to understand
where the 0 = 8c1 + 2c2 - 1c3 + 0c4 came from.. where did this -1 and 0 is from?
In general these types of questions are probably better suited for https://math.stackexchange.com/ but since I like math and linear algebra I'll also answer your question.
The system of equations that you have has 4 variables and only 2 equations, so this means that you're going to have more than one solution and actually an infinite number of solutions. Not just anything will work though, so let's find what the solutions look like.
To simplify writing this let's call the 2x4 matrix A and the right hand side b=(42,8)T so what we are trying to solve is Ax = b. Also to write the zero vector (0,0)T I'll use ⍬ to save typing.
They first find a particular solution (some xp such that Axp = b) they do this with the first 2 columns and 0's for the other columns: xp = (42,8,0,0)T and when we plug this in we do get Axp = (42,8)T. Next they are trying to find the other solutions.
Notice that if we can find an x0 such that Ax0 = ⍬ then we can add this to our xp and it'll still be a solution: A(xp + x0) = Axp + Ax0 = b + ⍬ = b. So let's see if we can find an x0.
They do this in (2.40) by basically saying the 3rd and 4th column are not used in xp so let's see if we can make them 0 using the first two columns. Really we're looking for anything that'll give Ax0=⍬ and this is just an idea how we might find that.
Now notice that the 3rd column (8,2)T can be written as 8 times first column + 2 times second column. So if we do 8 first column + 2 second column - 1 times third column (+ zero times forth column) we get zero. This is just x0 = (8,2,-1,0)T because if we do Ax0 = A(8,2,-1,0)T = ⍬. Similarly we can find another one of these by using the forth column to get (-4,12,0,-1)T as another independent vector which I'll call x02 because I can't think of a better notation at the moment. Both of these satisfy Ax0 = ⍬, Ax02 = ⍬.
To make this really concrete you can see that computing A(xp+x0) = A(50,10,-1,0)T really does give you (42,8)T.
So we can add either or both to our original xp and it'll still be a solution that gives us the b we're looking for. A(xp + x0 + x02) = Axp + Ax0 + Ax02 = b + ⍬ + ⍬ = b Also any multiple of these x0 or x02 will work as well because for example A(3*x0) = 3*Ax0 = 3*⍬ = ⍬.
So really A(xp + c*x0 + d*x02) = Axp + c*Ax0 + d*Ax02 = b + c*⍬ + d*⍬ = b which means that any vector of the form xp + c*x0 + d*x02 where c and d are any numbers (scalars) will be a solution. This is our solution set and is what the last part (2.43) is saying.

PL/SQL - need to do some conditioned calculations

I am a beginner in pl/sql so don't be too harsh.
I have a table with Column_A(Current month amount) and Column_B (previous month amount) as number. I need to write a condition for some calculations: "column_A - Column_b=result. If result > 0 (meaning that there is an increase in current month compared to previous), the result + column_A.
I don't know how to write this one.
You can try a query like below.
UPDATE your_table SET column_A=
( CASE
WHEN (column_A - Column_b)>0 THEN (column_A +(column_A - Column_b))
ELSE (column_A)
END )
This will check for all records that have a difference greater than zero and will update the column_A with the result which is a sum of Columns_A and the difference.
Hope this helps. Wish you a great learning!
Edited:
Well if you are just trying to manipulate data for display then you can simplify your query as below, which will do the same functionality.
SELECT (CASE
WHEN (Current_month_amount - previous_month_amount)>0 THEN
(Current_month_amount +(Current_month_amount -
previous_month_amount))
ELSE (Current_month_amount)
END ) AS Current_month_amount,
previous_month_amount,
(Current_month_amount - previous_month_amount) AS Amount_Difference
from table_1

Syntax in R, Overtime Pay

I am an intro into computer science student and have learned more on how to use python and am now learning R. I'm not used to R, and I've figured out how to calculate overtime pay, but I am not sure what is wrong with my syntax:
computePay <- function(pay,hours){
}if (hours)>=40{
newpay = 40-hours
total=pay*1.5
return(pay*40)+newpay*total
}else{
return (pay * hours)
}
How would I code this correctly?
Without looking at things like vectorization, a direct correction of your function would look something like:
computePay <- function(pay,hours) {
if (hours >= 40) {
newpay = hours - 40
total = pay * 1.5
return(pay*40 + newpay*total)
} else {
return(pay * hours)
}
}
This supports calling the function with a single pay and a single hours. You mis-calculated newpay (which really should be named something overhours), I corrected it.
You may hear people talk about "avoiding magic constants". A "magic constant" is a hard-coded number within code that is not perfectly clear and/or might be useful to allow the caller to modify. For instance, in some contracts it might be that overtime starts at a number other than 40, so that might be configurable. You can do that by changing the formals to:
computePay <- function(pay, hours, overtime_hours = 40, overtime_factor = 1.5)
and using those variables instead of hard-coded numbers. This allows the user to specify other values, but if not provided then they resort to sane defaults.
Furthermore, it might be useful to call it with a vector of one or the other, in which case the current function will fail because if (hours >= 40) needs a single logical value, but (e.g.) c(40,50) >= 40 returns a logical vector of length 2. We do this by introducing the ifelse function. Though it has some gotchas in advanced usage, it should work just fine here:
computePay1 <- function(pay, hours, overtime_hours = 40, overtime_factor = 1.5) {
ifelse(hours >= overtime_hours,
overtime_hours * pay + (hours - overtime_hours) * overtime_factor * pay,
pay * hours)
}
Because of some gotchas and deep-nested readability (I've seen ifelse stacked 12 levels deep), some people prefer other solutions. If you look at it closer, you may find that you can take further advantage of vectorization and pmax which is max applied piece-wise over each element. (Note the difference of max(c(1,3,5), c(2,4,4)) versus pmax(c(1,3,5), c(2,4,4)).)
Try something like this:
computePay2 <- function(pay, hours, overtime_hours = 40, overtime_factor = 1.5) {
pmax(0, hours - overtime_hours) * overtime_factor * pay +
pmin(hours, overtime_hours) * pay
}
To show how this works, I'll expand the pmax and pmin components:
hours <- c(20, 39, 41, 50)
overtime_hours <- 40
pmax(0, hours - overtime_hours)
# [1] 0 0 1 10
pmin(hours, overtime_hours)
# [1] 20 39 40 40
The rest sorts itself out.
Your "newpay*total" expression is outside the return command. You need put it inside the parentheses. The end bracket at the beginning of the second line should be moved to the last line. You also should have "(hours>=40)" rather than "(hours)>=40". Stylistically, the variable names are poorly chosen and there's no indentation (this might have helped you notice the misplaced bracket). Also, the calculation can be simplified:
total_pay = hourly_wage*(hours+max(0,hours-40)/2))
For every hour you work, you get your hourly wage. For every hour over 40 hours, you get your hourly wage plus half your hourly wage. So the total pay is wage*(total hours + (hours over 40)/2). Hours over 40 is either going to be total hours minus 40, or zero, whichever is larger.

Firebase exported to BigQuery: retention cohorts query

Firebase offer split testing functionality through Firebase remote configuration, but there are lack of ability to filter retention in cohorts sections with user properties (with any property in actual fact).
In quest of solution for this problem i'm looking for BigQuery, in reason of Firebase Analytics provide usable way to export data to this service.
But i stuck with many questions and google has no answer or example which may point me to the right direction.
General questions:
As first step i need to aggregate data which represent same data firebase cohorts do, so i can be sure my calculation is right:
Next step should be just apply constrains to the queries, so they match custom user properties.
Here what i get so far:
The main problem – big difference in users calculations. Sometimes it is about 100 users, but sometimes close to 1000.
This is approach i use:
# 1
# Count users with `user_dim.first_open_timestamp_micros`
# in specified period (w0 – week 1)
# this is the way firebase group users to cohorts
# (who started app on the same day or during the same week)
# https://support.google.com/firebase/answer/6317510
SELECT
COUNT(DISTINCT user_dim.app_info.app_instance_id) as count
FROM
(
TABLE_DATE_RANGE
(
[admob-app-id-xx:xx_IOS.app_events_],
TIMESTAMP('2016-11-20'),
TIMESTAMP('2016-11-26')
)
)
WHERE
STRFTIME_UTC_USEC(user_dim.first_open_timestamp_micros, '%Y-%m-%d')
BETWEEN '2016-11-20' AND '2016-11-26'
# 2
# For each next period count events with
# same first_open_timestamp
# Here is example for one of the weeks.
# week 0 is Nov20-Nov26, week 1 is Nov27-Dec03
SELECT
COUNT(DISTINCT user_dim.app_info.app_instance_id) as count
FROM
(
TABLE_DATE_RANGE
(
[admob-app-id-xx:xx_IOS.app_events_],
TIMESTAMP('2016-11-27'),
TIMESTAMP('2016-12-03')
)
)
WHERE
STRFTIME_UTC_USEC(user_dim.first_open_timestamp_micros, '%Y-%m-%d')
BETWEEN '2016-11-20' AND '2016-11-26'
# 3
# Now we have users for each week w1, w2, ... w5
# Calculate retention for each of them
# retention week 1 = w1 / w0 * 100 = 25.72181359
# rw2 = w2 / w1 * 100
# ...
# rw5 = w5 / w1 * 100
# 4
# Shift week 0 by one and repeat from step 1
BigQuery queries tips request
Any tips and directions to go about building complex query which may aggregate and calculate all data required for this task in one step is very appreciated.
Here is BigQuery Export schema if needed
Side questions:
why all the user_dim.device_info.device_id and user_dim.device_info.resettable_device_idis null?
user_dim.app_info.app_id is missing from the doc (if firebase support teammate will be read this question)
how event_dim.timestamp_micros and event_dim.previous_timestamp_micros should be used, i can not get their purpose.
PS
It will be good someone from Firebase teammate answer this question. Five month ago there are was one mention about extending cohorts functionality with filtering or show bigqueries examples, but things are not moving. Firebase Analytics is way to go they said, Google Analytics is deprecated, they said.
Now i spend second day to lean bigquery and build my own solution over the existing analytics tools. I no, stack overflow is not the place for this comments, but guys are you thinking? Split testing may grammatically affect retention of my app. My app does not sold anything, funnels and events is not valuable metrics in many cases.
Any tips and directions to go about building complex query which may aggregate and calculate all data required for this task in one step is very appreciated.
yes, generic bigquery will work fine
Below is not the most generic version, but can give you an idea
In this example I am using Stack Overflow Data available in Google BigQuery Public Datasets
First sub-select – activities – in most cases the only what you need to re-write to reflect specifics of your data.
What it does is:
a. Defines period you want to set for analysis.
In example below - it is a month - FORMAT_DATE('%Y-%m', ...
But you can use year, week, day or anything else – respectively
• By year - FORMAT_DATE('%Y', DATE(answers.creation_date)) AS period
• By week - FORMAT_DATE('%Y-%W', DATE(answers.creation_date)) AS period
• By day - FORMAT_DATE('%Y-%m-%d', DATE(answers.creation_date)) AS period
• …
b. Also it “filters” only the type of events/activity you need to analyse
for example, `WHERE CONCAT('|', questions.tags, '|') LIKE '%|google-bigquery|%' looks for answers for google-bigquery tagged question
The rest of sub-queries are more-less generic and mostly can be used as is
#standardSQL
WITH activities AS (
SELECT answers.owner_user_id AS id,
FORMAT_DATE('%Y-%m', DATE(answers.creation_date)) AS period
FROM `bigquery-public-data.stackoverflow.posts_answers` AS answers
JOIN `bigquery-public-data.stackoverflow.posts_questions` AS questions
ON questions.id = answers.parent_id
WHERE CONCAT('|', questions.tags, '|') LIKE '%|google-bigquery|%'
GROUP BY id, period
), cohorts AS (
SELECT id, MIN(period) AS cohort FROM activities GROUP BY id
), periods AS (
SELECT period, ROW_NUMBER() OVER(ORDER BY period) AS num
FROM (SELECT DISTINCT cohort AS period FROM cohorts)
), cohorts_size AS (
SELECT cohort, periods.num AS num, COUNT(DISTINCT activities.id) AS ids
FROM cohorts JOIN activities ON activities.period = cohorts.cohort AND cohorts.id = activities.id
JOIN periods ON periods.period = cohorts.cohort
GROUP BY cohort, num
), retention AS (
SELECT cohort, activities.period AS period, periods.num AS num, COUNT(DISTINCT cohorts.id) AS ids
FROM periods JOIN activities ON activities.period = periods.period
JOIN cohorts ON cohorts.id = activities.id
GROUP BY cohort, period, num
)
SELECT
CONCAT(cohorts_size.cohort, ' - ', FORMAT("%'d", cohorts_size.ids), ' users') AS cohort,
retention.num - cohorts_size.num AS period_lag,
retention.period as period_label,
ROUND(retention.ids / cohorts_size.ids * 100, 2) AS retention , retention.ids AS rids
FROM retention
JOIN cohorts_size ON cohorts_size.cohort = retention.cohort
WHERE cohorts_size.cohort >= FORMAT_DATE('%Y-%m', DATE('2015-01-01'))
ORDER BY cohort, period_lag, period_label
You can visualize result of above query with the tool of your choice
Note: you can use either period_lag or period_label
See the difference of their use in below examples
with period_lag
with period_label

MomentJS - making fromNow() round down

I was using the following code to calculate the age of a person:
var year = 1964;
var month = 1;
var day = 20;
var age = moment(year + '-' + month + '-' + date, 'YYYY-MM-DD').fromNow(true);
The problem with fromNow() is that it rounds the number up or down depending on the decimal point. I would like it to only round down. In the above example the person's real age is 51 but it's returning 52 because his age is actually something like 51.75.
If I use diff() instead it rounds down which is perfect. But it doesn't give me the pretty text 51 years old.
var age = moment().diff([year, month - 1, date], 'years');
My question is, is there a way to make fromNow() round down?
You can configure a custom rounding function:
moment.relativeTimeRounding(Math.floor)
The provided solution is correct but I thought I'd add a little explanation as this was the first google result.
Say I have a scenario where I want dates that were 1m 30s ago to display a minute ago rather than two minutes ago:
const minuteAndAHalfAgo = new Date();
minuteAndAHalfAgo.setMinutes(minuteAndAHalfAgo.getMinutes() - 1);
minuteAndAHalfAgo.setSeconds(minuteAndAHalfAgo.getSeconds() - 30)
moment.relativeTimeRounding(Math.floor);
console.log(moment(minuteAndAHalfAgo).fromNow()); // a minute ago
the relativeTimeRounding function takes a function as an argument which in our case is Math.floor which means the relative time evaluation will be rounded down. This can be found in the docs https://momentjs.com/docs/#/customization/relative-time-rounding/ - you can also specify a relativeTimeThreshold — the point at which to round the number.

Resources