How do I compare multiple rows and get the min of date in a new column? - sqlcompare

COMPANY PROD ID DATE NAME REQD_DATE
XYZ 12345 AAA111 7/1/2011 PETER
XYZ 12345 PPP222 7/1/2002 JOHN
MNS 67890 ZZZ999 9/1/2005 STEVE
MNS 67890 DDD555 9/1/2012 MARTIN
With reference to this above table say '#temp',
Whenever COMPANY AND PROD are same, I want to get the min of date {min(DATE)} into column 'REQD_DATE'
Can someone please help me out here?
Thanks in advance!

Related

Calculating customer retention between months SQLite

I need to calculate the customer retention between months. My table is as such
year_month
customer_id
2022-05
abc
2022-05
asd
2022-05
xyz
2022-06
abc
2022-06
xyz
2022-06
qwe
2022-07
abc
2022-07
asd
I need to get an output such that if a customer id showed up in both the current and previous month, then they will be considered as a retained customer.
For instance, as abc and xyz are in both 2022-05 and 2022-06, thus there were 2 customers retained in 2022-06. asd is not included in retention as it does not appear in consecutive months.
Below is an example of the output.
Expected output:
year_month
customers_retained
2022-06
2
2022-07
1
Parse the date first, then do self join and group by date. See the fiddle
WITH
data_ex AS (
SELECT
substr(year_month, 1, 4) * 12 + substr(year_month, 6, 2) AS month,
year_month,
customer_id
FROM data
)
SELECT data_ex.year_month, sum(prev.customer_id IS NOT NULL) AS retained
FROM data_ex, data_ex AS prev
WHERE data_ex.month = prev.month + 1 AND data_ex.customer_id = prev.customer_id
GROUP BY data_ex.month;

How to replace "Unknowns" in R

id Date created gender age
5uwns89zht 7/1/2014 FEMALE 35
jtl0dijy2j 7/1/2014 -unknown- -
xx0ulgorjt 7/1/2014 -unknown- -
6c6puo6ix0 7/1/2014 -unknown- -
czqhjk3yfe 7/1/2014 -unknown- -
Hi,
I wanted to understand how do we replace the missing values in the gender column with NULL or NA and how do we fill in the missing values in age?
I tried the replacement function for Unknowns as follows:
traindata_z$gender<-replace('-unknown-', np.nan, inplace = TRUE)
And for missing values, I am not sure what code do I pass.
Could you help me with this please?
Thanks.
Try this,
traindata_z$gender<- gsub("-unknown-",NA,traindata_z$gender)
traindata_z$age<- gsub("-",NA,traindata_z$age)
You can use the following options.
# option 1
traindata_z$gender[traindata_z$gender == "-unknown-"] <- NA
# option 2
traindata_z$gender <- ifelse(traindata_z$gender == "-unknown-",
NA, traindata_z$gender)

Leave all logs from users who have specific log (r language)

I have my table like this(input):
User Event
Mike error
Mike buy
Bony error
Bony like
Mike rate
Mike like
I need to leave all logs from users who do not have rate in Event(output):
User Event
Mike error
Mike buy
Mike rate
Mike like
Thanks for help!
A dplyr solution can be :
library(dplyr)
df %>%
group_by(User) %>%
filter(sum(Event == 'rate') > 0)
# User Event
# <fctr> <fctr>
#1 Mike error
#2 Mike buy
#3 Mike rate
#4 Mike like

Getting multiple sub-totals & group_by in a table form in R

I have a dataset (a CSV) of phone calls. It contains several columns but the important columns are "Persons calling" and "Persons called". The data is all strings (names). The entire work done is on these strings of two column data. As in example:
Caller Receiver
Alice Mary
Kate Betty
Alice Betty
Mary Kate | Jane
Jane Alice
The output desired is in the form of the number of calls made by a person and the persons made to. For instance, the output for above would be like:
Caller Receiver CallFreq
Alice Mary 1
Betty 1
Kate Betty 1
Mary Kate 1
Jane 1
Jane Alice 1
The Total calls made by the person could be included in the above table or in another table.
The unnest function from the tidyr package is super useful in this case.
output <-
mydata %>%
group_by(Caller) %>%
summarise(Receiver = paste(unique(Receiver), collapse=' | ')) %>%
mutate(Receiver = strsplit(Receiver, ' \\| ')) %>%
unnest(Receiver) %>%
group_by(Caller) %>%
mutate(CallFreq = 1, TotalCalls = n_distinct(Receiver))
To run the above code directly, you'd need to use the packages dplyr, magrittr, and tidyr.

Extract column name and specific value based on a condition

Suppose that I have the following data frame:
firstname <- c('Doug','Tom','Glenn','Billy','Angelo')
city <- c('Tulsa','Unknown','Miami','Houston','Unknown')
state <- c('OK','CA','FL','Unknown','Unknown')
job <- c('Unknown','Plumber','Professor','Unknown','Unknown')
list_test <- data.frame(firstname, city, state, job)
I want to extract the firstname and column names where one of the columns is Unknown. In other words, I want a table that looks like this:
firstname attribute
Doug job
Tom city
Billy state
Billy job
Angelo city
Angelo state
Angelo job
library(reshape2)
library(dplyr)
list_test%>%melt(id.var='firstname',variable.name='attribute')
%>%filter(value=='Unknown')
%>%select(-3)
firstname attribute
1 Tom city
2 Angelo city
3 Billy state
4 Angelo state
5 Doug job
6 Billy job
7 Angelo job
You can loop through the names of the columns you want to process, building a data frame with all the first names that are missing that attribute. Then you can combine them all with do.call and rbind:
do.call(rbind, lapply(tail(names(list_test), -1), function(x) {
data.frame(firstname=list_test$firstname[list_test[,x] == "Unknown"], attribute=x)
}))
# firstname attribute
# 1 Tom city
# 2 Angelo city
# 3 Billy state
# 4 Angelo state
# 5 Doug job
# 6 Billy job
# 7 Angelo job
Solution without loops; probably scales better for larger datasets.
library(reshape2)
#transform to long format
m_l <- melt(list_test,id = "firstname",factorsAsStrings=T)
#ignore warning; expected
#make selection
res <- m_l[m_l$value=="Unknown",-3]
#order (for completeness' sake)
> res[order(res$firstname),]
firstname variable
5 Angelo city
10 Angelo state
15 Angelo job
9 Billy state
14 Billy job
11 Doug job
2 Tom city
Adding a tidyr and dplyr solution. I find it way more elagant:
library(dplyr)
library(tidyr)
list_test %>%
gather(field, value, -firstname) %>%
filter(value == "Unknown") %>%
select(-value) %>%
arrange(firstname)
Where the last two lines are rather cosmetic fixes. You can ignore the warning about dropping attributes. It's just telling you that it converted factor to character vector.
Another simple option using tidyr's gather and base R subset
library(tidyr)
subset(gather(list_test, "firstname"), value == "Unknown")
# firstname firstname.1 value
#2 Tom city Unknown
#5 Angelo city Unknown
#9 Billy state Unknown
#10 Angelo state Unknown
#11 Doug job Unknown
#14 Billy job Unknown
#15 Angelo job Unknown
A data.table example:
library(data.table)
list_test <- data.table(firstname, city, state, job)
varlist <- names(list_test)[2:4]
do.call(rbind,sapply(varlist, function(x) list_test[get(x)=='Unknown',list(firstname,col = x)], simplify=FALSE))
It's a little messy - I'm hoping someone could suggest a better data.table approach.

Resources