I have two tables:
table "A" with various items identified by item codes (integer). each item appears several times, with different upload dates. the tables also show the store under which the item is sold (store ID- integer)
table "B" with a list of the desired item codes (50 items) to draw.
I am interested in extracting (showing) items from the first table, according to the item codes on the second table. the items chosen should also have the highest upload date and belong to a specific store id (as I choose).
for example: the item "rice", has an item code of - 77893. this item code is on table "B", meaning I want to show it. in table "A", there are multiple entries for "rice":
table exapmle
table "A":
item_name | item_code | upload_date | store_id
rice | 77893 | 2021-11-18 | 001
rice | 77893 | 2020-05-30 | 011
rice | 77893 | 2020-11-02 | 002
apple | 90837 | 2020-05-14 | 002
apple | 90837 | 2020-05-14 | 020
rice | 77893 | 2020-05-15 | 002
apple | 90837 | 2020-01-08 | 002
rice | 77893 | 2020-05-15 | 005
table "B":
item_code
90837
77893
output:
item_name | item_code | upload_date | store_id
rice | 77893 | 2020-11-02 | 002
apple | 90837 | 2020-05-14 | 002
"rice" and "apple" have item codes that are also on table "B". in this example, I am interested in items that are sold at store 002.
so far I only managed to return the item by its latest upload date. however, I inserted the item code manually and also was not able to filter store_id's.
any help or guidelines on how to execute this idea will be very helpful.
thank you!
Filter the rows of the table A for the store that you want and the items that you have in the table B and then aggregate to get the rows with the max date:
SELECT item_name, MAX(upload_date) upload_date, store_id
FROM A
WHERE store_id = '002' AND item_name IN (SELECT item_name FROM B)
GROUP BY item_code;
or, with a join:
SELECT A.item_name, MAX(A.upload_date) upload_date, A.store_id
FROM A INNER JOIN B
ON B.item_code = A.item_code
WHERE A.store_id = '002'
GROUP BY A.item_code;
See the demo.
Related
I have a MariaDB Database with Users and their appropriate registration date, something like:
+----+----------+----------------------+
| ID | Username | RegistrationDatetime |
+----+----------+----------------------+
| 1 | A | 2022-01-03 12:00:00 |
| 2 | B | 2022-01-03 14:00:00 |
| 3 | C | 2022-01-04 23:00:00 |
| 4 | D | 2022-01-04 14:00:00 |
| 5 | E | 2022-01-05 14:00:00 |
+----+----------+----------------------+
I want to know the total number of users in the system at the end of every date with just one query - is that possible?
So the result should be something like:
+------------+-------+
| Date | Count |
+------------+-------+
| 2022-01-03 | 2 |
| 2022-01-04 | 4 |
| 2022-01-05 | 5 |
+------------+-------+
Yes it's easy with single queries and looping over the dates using PHP, but how to do it with just one query?
EDIT
Thanks for all the replies, yes, users could get cancelled / deleted, i.e. going by the max(ID) for a specific time period is NOT possible. There could be gaps in the column ID
Use COUNT() window function:
SELECT DISTINCT
DATE(RegistrationDatetime) AS Date,
COUNT(*) OVER (ORDER BY DATE(RegistrationDatetime)) AS Count
FROM tablename;
See the demo.
SELECT
date(RegistrationDatetime ),
sum(count(*)) over (order by date(RegistrationDatetime ))
FROM
mytable
GROUP BY
date(RegistrationDatetime );
output:
date(RegistrationDatetime )
sum(count(*)) over (order by date(RegistrationDatetime ))
2022-01-03
2
2022-01-04
4
2022-01-05
5
see: DBFIDDLE
SELECT t1.RegistrationDatetime AS Date,
(SELECT COUNT(*) FROM users t2 WHERE t2.RegistrationDatetime <= t1.RegistrationDatetime) AS Count
FROM users t1
GROUP BY t1.RegistrationDatetime
If you have no cancelled users, you can do:
SELECT DATE(RegistrationDatetime) AS date_, MAX(Id) AS cnt
FROM tab
GROUP BY DATE(RegistrationDatetime)
Check the demo here.
Otherwise you may need to use a ROW_NUMBER to generate that ranking:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER(ORDER BY RegistrationDatetime) AS rn
FROM tab
)
SELECT DATE(RegistrationDatetime) AS date_, MAX(rn) AS cnt
FROM cte
GROUP BY DATE(RegistrationDatetime)
Check the demo here.
I am trying (unsuccessfully) to do the equivalent of an HLOOKUP nested within a VLOOKUP in Excel using R Studio.
Here is the situation.
I have two tables. Table 1 has historical stock prices, where each column represents a ticker name and each row represents a particular date. Table 1 contains the closing stock price for each ticker on each date.
Assume Table 1 looks like this:
|----------------------------|
| Date |MSFT | AMZN |EPD |
|----------------------------|
| 6/1/2020 | 196 | 2600 | 19 |
| 5/1/2020 | 186 | 2200 | 20 |
| 4/1/2020 | 176 | 2000 | 15 |
| 3/1/2020 | 166 | 1800 | 14 |
| 2/1/2020 | 170 | 2200 | 18 |
| 1/1/2020 | 180 | 2300 | 17 |
|----------------------------|
Table 2 has a list of ticker symbols, as well as two dates and placeholders for the stock price on each date. Date1 is always an earlier date than Date2, and each of Date1 and Date2 corresponds with a date in Table 1. Note that Date1 and Date2 are different for each row of Table 2.
My objective is to pull the applicable PriceOnDate1 and PriceOnDate2 into Table 2 similar to VLOOKUP / HLOOKUP functions in Excel. (I can't use Excel going forward on this, as the file is too big for Excel to handle). Then I can calculate the return for each row by a formula like this: (Date2 - Date1) / Date1
Assume I want Table 2 to look like this, but I am unable to pull in the pricing data for PriceOnDate1 and PriceOnDate2:
|-----------------------------------------------------------|
| Ticker | Date1 | Date2 |PriceOnDate1 |PriceOnDate2 |
|-----------------------------------------------------------|
| MSFT | 1/1/2020 | 4/1/2020 | _________ | ________ |
| MSFT | 2/1/2020 | 6/1/2020 | _________ | ________ |
| AMZN | 5/1/2020 | 6/1/2020 | _________ | ________ |
| EPD | 1/1/2020 | 3/1/2020 | _________ | ________ |
| EPD | 1/1/2020 | 4/1/2020 | _________ | ________ |
|-----------------------------------------------------------|
My question is whether there is a way to use R to pull into Table 2 the closing price data from Table 1 for each Date1 and Date2 in each row of Table 2. For instance, in the first row of Table 2, ideally the R code would pull in 180 for PriceOnDate1 and 176 for PriceOnDate2.
I've tried searching for answers, but I am unable to craft a solution that would allow me to do this in R Studio. Can anyone please help me with a solution? I greatly appreciate your time. THANK YOU!!
Working in something like R requires you to think of the data a bit differently. Your Table 1 is probably easiest to work with pivoted into a long format. You can then just join together on the Ticker and Date to pull the values you want.
Data:
table_1 <- data.frame(Date = c("6/1/2020", "5/1/2020", "4/1/2020", "3/1/2020",
"2/1/2020", "1/1/2020"),
MSFT = c(196, 186, 176, 166, 170, 180),
AMZN = c(2600, 2200, 2000, 1800, 2200, 2300),
EPD = c(19, 20, 15, 14, 18, 17))
# only created part of Table 2
table_2 <- data.frame(Ticker = c("MSFT", "AMZN"),
Date1 = c("1/1/2020", "5/1/2020"),
Date2 = c("4/1/2020", "6/1/2020"))
Solution:
The tidyverse approach is pretty easy here.
library(dplyr)
library(tidyr)
First, pivot Table 1 to be longer.
table_1_long <- table_1 %>%
pivot_longer(-Date, names_to = "Ticker", values_to = "Price")
Then join in the prices that you want by matching the Date and Ticker.
table_2 %>%
left_join(table_1_long, by = c(Date1 = "Date", "Ticker")) %>%
left_join(table_1_long, by = c(Date2 = "Date", "Ticker")) %>%
rename(PriceOnDate1 = Price.x,
PriceOnDate2 = Price.y)
# Ticker Date1 Date2 PriceOnDate1 PriceOnDate2
# 1 MSFT 1/1/2020 4/1/2020 180 176
# 2 AMZN 5/1/2020 6/1/2020 2200 2600
The mapply function would do it here:
Let's say your first table is stored in a data.frame called df and the second in a data.frame called df2
df2$PriceOnDate1 <- mapply(function(ticker, date){temp[[ticker]][df$Date == date]}, df2$Ticker, df2$Date1)
df2$PriceOnDate2 <- mapply(function(ticker, date){temp[[ticker]][df$Date == date]}, df2$Ticker, df2$Date2)
In this code, the Hlookup is the double brackets ([[), which returns the column with that name. The VLOOKUP is the single brackets ([) which returns the value in a certain position.
This can be done with a single join if both data frames are in long format, followed by a pivot_wider to get the desired final shape.
The code below uses #Adam's sample data. Note that in the sample data, the dates are coded as factors. You'll probably want your dates coded as R's Date class in your real data.
library(tidyverse)
table_2 %>%
pivot_longer(-Ticker, values_to="Date") %>%
left_join(
table_1 %>%
pivot_longer(-Date, names_to="Ticker", values_to="Price")
) %>%
pivot_wider(names_from=name, values_from=c(Date, Price)) %>%
rename_all(~gsub("Date_", "", .))
Ticker Date1 Date2 Price_Date1 Price_Date2
1 MSFT 1/1/2020 4/1/2020 180 176
2 AMZN 5/1/2020 6/1/2020 2200 2600
I'm trying to filter a data.frame with family information. It looks like this:
+--------+-------+---------+
| name | dad | mom |
+--------+-------+---------+
| john | bert | ernie |
| quincy | adam | eve |
| anna | david | goliath |
| daniel | bert | ernie |
| sandra | adam | linda |
+--------+-------+---------+
Now I want to know if every person who has the same dad, also has the same mom. I've been over this for an hour now trying different approaches, but i keep getting stuck. Also, i'd like to use an R-approach and not a long sequence of functions or for-loops that technically does what i want, without learning anything new.
My expected output:
+--------+------+-------+
| name | dad | mom |
+--------+------+-------+
| quincy | adam | eve |
| sandra | adam | linda |
+--------+------+-------+
Essentially I want to have a data.frame with dads and moms who have kids from multiple partners.
So far my approach has been:
split the df by the father column
from the resulting list of dfs, remove all dfs with only one row (here i already get stuck, cant make it work)
remove all dfs where nrow(unique(df$mom)) = 1
the resulting list should give me all siblings with different parents.
My code up to now:
fraternals <- split(kinship, kinship$father)
fraternals <- fraternals[-which(lapply(fraternals, function(x) if(nrow(x) == 1) { output TRUE }))]
but that doesn't run because r says i can not use TRUE in that way.
One dplyr possibility could be:
df %>%
group_by(dad) %>%
filter(n_distinct(mom) != 1)
name dad mom
<chr> <chr> <chr>
1 quincy adam eve
2 sandra adam linda
If you don't want to filter but want to see this information:
df %>%
group_by(dad) %>%
mutate(cond = n_distinct(mom) != 1)
name dad mom cond
<chr> <chr> <chr> <lgl>
1 john bert ernie FALSE
2 quincy adam eve TRUE
3 anna david goliath FALSE
4 daniel bert ernie FALSE
5 sandra adam linda TRUE
Here is an option using data.table
library(data.table)
setDT(df)[, .SD[uniqueN(mom) != 1], .(dad)]
I've googled lots of examples about how to perform a CountIF in R, however I still didn't find the solution for what I want.
I basically have 2 dataframes:
df1: customer_id | date_of_export - here, we have only 1 date of export per customer
df2: customer_id | date_of_delivery - here, a customer can have different delivery dates (which means, same customer will appear more than once in the list)
And I need to count, for each customer_id in df1, how many deliveries they got after the export date. So, I need to count if df1$customer_id = df2$customer_id AND df1$date_of_export <= df2$date_of_delivery
To understand better:
customer_id | date_of_export
1 | 2018-01-12
2 | 2018-01-12
3 | 2018-01-12
customer_id | date_of_delivery
1 | 2018-01-10
1 | 2018-01-17
2 | 2018-01-13
2 | 2018-01-20
3 | 2018-01-04
My output should be:
customer_id | date_of_export | deliveries_after_export
1 | 2018-01-12 | 1 (one delivery after the export date)
2 | 2018-01-12 | 2 (two deliveries after the export date)
3 | 2018-01-12 | 0 (no delivery after the export date)
Doesn't seem that complicated but I didn't find a good approach to do that. I've been struggling for 2 days and nothing accomplished.
I hope I made myself clear here. Thank you!
I would suggest merging the two data.frames together and then it's a simple sum():
library(data.table)
df3 <- merge(df1, df2)
setDT(df3)[, .(deliveries_after_export = sum(date_of_delivery > date_of_export)), by = .(customer_id, date_of_export)]
# customer_id date_of_export deliveries_after_export
#1: 1 2018-01-12 1
#2: 2 2018-01-12 2
#3: 3 2018-01-12 0
I would like to extract unique values based on the sum in another column. For example, I have the following data frame "music"
ID | Song | artist | revenue
7520 | Dance with me | R kelly | 2000
7531 | Gone girl | Vincent | 1890
8193 | Motivation | R Kelly | 3500
9800 | What | Beyonce | 12000
2010 | Excuse Me | Pharell | 1010
1999 | Remove me | Jack Will | 500
Basically, I would like to sort the top 5 artists based on revenue, without the duplicate entries on a given artist
You just need order() to do this. For instance:
head(unique(music$artist[order(music$revenue, decreasing=TRUE)]))
or, to retain all columns (although uniqueness of artists would be a little trickier):
head(music[order(music$revenue, decreasing=TRUE),])
Here's the dplyr way:
df <- read.table(text = "
ID | Song | artist | revenue
7520 | Dance with me | R Kelly | 2000
7531 | Gone girl | Vincent | 1890
8193 | Motivation | R Kelly | 3500
9800 | What | Beyonce | 12000
2010 | Excuse Me | Pharell | 1010
1999 | Remove me | Jack Will | 500
", header = TRUE, sep = "|", strip.white = TRUE)
You can group_by the artist, and then you can choose how many entries you want to peak at (here just 3):
require(dplyr)
df %>% group_by(artist) %>%
summarise(tot = sum(revenue)) %>%
arrange(desc(tot)) %>%
head(3)
Result:
Source: local data frame [3 x 2]
artist tot
1 Beyonce 12000
2 R Kelly 5500
3 Vincent 1890