Merge and update columns - r

I am trying to rebuild some MS Access update query logic with R's merge function, as the update query logic is missing a few arguments.
Table link Google drive
In my database "Invoice Account allocation", there are 2 tables:
Account_Mapping Table:
Origin Origin Postal Destination Destination Invoice
country code country postal code Account
FRA 01 GBR * ZR001
FRA 02 BEL * ZR003
BEL 50 ARG * ZR002
GER 01 ITA * ZR002
POL 02 ESP * ZR001
ESP 50 NED * ZR003
* 95 FRA 38 ZR001
BEL * * * ZR002
* * * * ZR003
FRA * FRA 25 ZR004
Load_ID
ID Origin Postal Destination Destination Default
country code postal code Invoice Account
2019SN0201948 FRA 98 FRA 38 XXAC001
2019SN0201958 POL 56 GBR 15 XXAC001
2019SN0201974 BEL 50 ARG 27 XXAC001
2019SN0201986 FRA 02 GER 01 XXAC001
The default invoice account in tables (Load_ID and Status_ID) is updated by the invoice account from the Account_Mapping table.
The tables Account_Mapping and Load_ID can be joined by:
Origin country & Origin country,
Origin Postal code & Postal code,
Destination country & Destination, and
Destination postal code & Destination postal code.
In the account_mapping table, there are several "*", it means the string value can take any value. I am not able build this logic with merge function. Please help me with a better logic.
New_Assigned_Account_Final <- merge(Load_ID, Account_Mapping, by.x =
c("Origin country","Postal code","Destination", "Destination postal code"),
by.y =
c("Origin country","Origin Postal code","Destination country", "Destination
postal code"))
Desired result:
Updated Load_ID table as below.
Load_ID:
ID Origin Postal Destination Destination Default
country code postal code Invoice Account
2019SN0201948 FRA 98 FRA 38 ZR003
2019SN0201958 POL 56 GBR 15 ZR003
2019SN0201974 BEL 50 ARG 27 ZR002
2019SN0201986 FRA 02 GER 01 ZR003
For the first ID, the default ID becomes "ZR003" because, "FRA" as Origin country doesn't have a Postal code - "98", so it falls under the all "*" bucket and is allocated to ZR003.
For the third ID, the default ID becomes "ZR002" because, "BEL" as Origin country has a Postal code - "50" associated with it, and the destination postal code of "ARG" can be anything because of the "*" in the Destination postal code column, therefore it is allocated to ZR002.
Thank you for your inputs.

Related

How can I get the key to increment when it is a string

I need to take someone’s age, and then outputs a key event from every year they have lived through.
dYearlyEvents = {
"1993": "Bill Clinton is inaugurated as the 42nd president.",
"1994": "The People's Republic of China gets its first connection to the Internet.",
"1995": "eBay is founded by Pierre Omidyar.",
"1996": "Murder of Tupac Shakur.",
"1997": "The first episode of Pokémon airs on TV Tokyo.",
"1998": "Death of Frank Sinatra.",
"1999": "The Columbine High School massacre in Colorado, United States, causes 15 deaths.",
"2000": "The Sony PlayStation 2 releases in Japan. ",
}
sBirthYear = (input("What year were you born in: \n"))
while True:
if sBirthYear in dYearlyEvents:
print(dYearlyEvents[sBirthYear])
sBirthYear += 1
This is what I tried but obviously as the input is a string it wont add a year every time it loops to print all events from 1993 to 2000 instead just prints 1993.

JOINing databases with SQLite

I have 4 databases relating to the America's Cup.
SELECT * FROM teams
>
Code | Country | TeamName
ITA |Italy | Luna Rossa Prada Pirelli Team
NZ |New Zealand | Emirates Team New Zealand
UK |United Kingdom | INEOS Team UK
USA |United States of America | NYYC American Magic
4 rows
SELECT * FROM races
>
Race Tournament Date Racedate
RR1R1 RR 15-Jan 18642
RR1R2 RR 15-Jan 18642
RR1R3 RR 16-Jan 18643
RR2R1 RR 16-Jan 18643
RR2R2 RR 17-Jan 18644
RR2R3 RR 17-Jan 18644
RR3R1 RR 23-Jan 18650
RR3R2 RR 23-Jan 18650
RR3R3 RR 23-Jan 18650
SFR1 SF 29-Jan 18656
1-10 of 31 rows
SELECT * FROM tournaments
>
Tournament Event TournamentName
RR Prada Cup Round Robin
SF Prada Cup Semi-Final
F Prada Cup Final
AC America's Cup Americas Cup
4 rows
SELECT *
FROM results
>
Race Code Result
FR1 ITA Win
FR1 UK Loss
FR2 UK Loss
FR2 ITA Win
FR3 UK Loss
FR3 ITA Win
FR4 ITA Win
FR4 UK Loss
FR5 ITA Win
FR5 UK Loss
1-10 of 62 rows
and I'm trying to write an SQL query that will output the number of races each team won by tournament, and show the output. The output table should include the full name of the Event, the Tournament and the full name of each team. My query at the moment looks like this:
SELECT TeamName, Result, Event, tournaments.Tournament
FROM teams LEFT JOIN results
ON teams.Code = results.Code
LEFT JOIN races
ON results.Race = races.Race
LEFT JOIN tournaments
ON races.Tournament = tournaments.Tournament
WHERE Result = 'Win'
ORDER BY tournaments.Tournament
which outputs:
TeamName Result Event Tournament
Emirates Team New Zealand Win America's Cup AC
Emirates Team New Zealand Win America's Cup AC
Luna Rossa Prada Pirelli Team Win America's Cup AC
Luna Rossa Prada Pirelli Team Win America's Cup AC
Emirates Team New Zealand Win America's Cup AC
Luna Rossa Prada Pirelli Team Win America's Cup AC
Emirates Team New Zealand Win America's Cup AC
Emirates Team New Zealand Win America's Cup AC
Emirates Team New Zealand Win America's Cup AC
Emirates Team New Zealand Win America's Cup AC
When I try to COUNT(Result) AS NumberOfWins, I get:
TeamName Result NumberOfWins Event Tournament
Luna Rossa Prada Pirelli Team Win 31 Prada Cup F
1 row
Why does adding the count count only Luna Rossa's wins? How can I change the query to fix it?
Why does adding the count count only Luna Rossa's wins?
Count() is an aggregate function and produces one result per GROUP.
As you have no GROUP BY clause the entire result set is a single group and hence the single result.
The reason why you got Tournament F is due to
If the SELECT statement is an aggregate query without a GROUP BY clause, then each aggregate expression in the result-set is evaluated once across the entire dataset. Each non-aggregate expression in the result-set is evaluated once for an arbitrarily selected row of the dataset. The same arbitrarily selected row is used for each non-aggregate expression. Or, if the dataset contains zero rows, then each non-aggregate expression is evaluated against a row consisting entirely of NULL values. As per SQLite SELECT -
How can I change the query to fix it?
So you need a GROUP BY clause. To create groups upon which the count() function will work on.
You probably want GROUP BY Tournament,TeamName
e.g.
SELECT TeamName, Result, Event, tournaments.Tournament, count(*)
FROM teams LEFT JOIN results
ON teams.Code = results.Code
LEFT JOIN races
ON results.Race = races.Race
LEFT JOIN tournaments
ON races.Tournament = tournaments.Tournament
WHERE Result = 'Win'
GROUP BY Tournament,Teamname
ORDER BY tournaments.Tournament

Extract date from a text document in R

I am again here with an interesting problem.
I have a document like shown below:
"""UDAYA FILLING STATION ps\na MATTUPATTY ROAD oe\noe 4 MUNNAR Be:\nSeat 4 04865230318 Rat\nBree 4 ORIGINAL bepas e\n\noe: Han Die MC DE ER DC I se ek OO UO a Be ten\" % aot\n: ag 29-MAY-2019 14:02:23 [i\n— INVOICE NO: 292 hee fos\nae VEHICLE NO: NOT ENTERED Bea\nss NOZZLE NO : 1 ome\n- PRODUCT: PETROL ae\ne RATE : 75.01 INR/Ltr yee\n“| VOLUME: 1.33 Ltr ae\n~ 9 =6AMOUNT: 100.00 INR mae wae\nage, Ee pel Di EE I EE oe NE BE DO DC DE a De ee De ae Cate\notome S.1T. No : 27430268741C =. ver\nnes M.S.T. No: 27430268741V ae\n\nThank You! Visit Again\n""""
From the above document, I need to extract date highlighted in bold and Italics.
I tried with strpdate function but did not get the desired results.
Any help will be greatly appreciated.
Thanks in advance.
Assuming you only want to capture a single date, you may use sub here:
text <- "UDAYA FILLING STATION ps\na MATTUPATTY ROAD oe\noe 4 MUNNAR Be:\nSeat 4 04865230318 Rat\nBree 4 ORIGINAL bepas e\n\noe: Han Die MC DE ER DC I se ek OO UO a Be ten\" % aot\n: ag 29-MAY-2019 14:02:23 [i\n— INVOICE NO: 292 hee fos\nae VEHICLE NO: NOT ENTERED Bea\nss NOZZLE NO : 1 ome\n- PRODUCT: PETROL ae\ne RATE : 75.01 INR/Ltr yee\n“| VOLUME: 1.33 Ltr ae\n~ 9 =6AMOUNT: 100.00 INR mae wae\nage, Ee pel Di EE I EE oe NE BE DO DC DE a De ee De ae Cate\notome S.1T. No : 27430268741C =. ver\nnes M.S.T. No: 27430268741V ae\n\nThank You! Visit Again\n"
date <- sub("^.*\\b(\\d{2}-[A-Z]+-\\d{4})\\b.*", "\\1", text)
date
[1] "29-MAY-2019"
If you had the need to match multiple such dates in your text, then you may use regmatches along with regexec:
text <- "Hello World 29-MAY-2019 Goodbye World 01-JAN-2018"
regmatches(text,regexec("\\b(\\d{2}-[A-Z]+-\\d{4})\\b", text))[[1]]
[1] "29-MAY-2019" "29-MAY-2019"

Specify a time to users in multiple time zones (especially USA/Canada)

I'm implementing a sign up facility which sends a link with a token in it - the token is valid for 1 hour. So in the email (let's say it is 14:20 now) I want to say:
You must click this link by 15:30
The audience for this site will be in Ireland / UK, USA / Canada and perhaps some in Europe - so I wanted to list the expiry time in several time zones that these (non technical) people will understand.
So this is what I came up with
Click by:
Ireland/UK > 25 Apr 2018 13:59
CET (Berlin) > 25 Apr 2018 14:59
Pacific (Los Angeles) > 25 Apr 2018 05:59
Mountain (Denver) > 25 Apr 2018 06:59
Central (Chicago) > 25 Apr 2018 07:59
Eastern (New York) > 25 Apr 2018 08:59
Now, I understand that Denver is currently MDT (and MST in the winter), but here in Ireland, we are now in IST (UTC + 1) or GMT in the winter/fall - but if you ask a random person what timezone are we in, at best you will get GMT as a response all year round. So, I list the time there as a generic 'Mountain' and give a sample city.
How is this approach for people in USA / Canada?
My code is below and here is a live link
<?php
$exipry = 60*60 + time();
$now_obj = new DateTime(date("Y-m-d H:i:s", $exipry));
$now_obj->setTimezone(new DateTimeZone('Europe/Dublin'));
$now_hour_from_IRELAND_datetime = $now_obj->format('d M Y H:i');
$now_obj->setTimezone(new DateTimeZone('Europe/Berlin'));
$now_hour_from_CET_datetime = $now_obj->format('d M Y H:i');
$now_obj->setTimezone(new DateTimeZone('America/Los_Angeles'));
$now_hour_from_pacific_datetime = $now_obj->format('d M Y H:i');
$now_obj->setTimezone(new DateTimeZone('America/Denver'));
$now_hour_from_mountain_datetime = $now_obj->format('d M Y H:i');
$now_obj->setTimezone(new DateTimeZone('America/Chicago'));
$now_hour_from_central_datetime = $now_obj->format('d M Y H:i');
$now_obj->setTimezone(new DateTimeZone('America/New_York'));
$now_hour_from_eastern_datetime = $now_obj->format('d M Y H:i');
print("<h1>1 hour from now is:</h1>");
print("Ireland/UK > $now_hour_from_IRELAND_datetime<p>");
print("CET (Berlin) > $now_hour_from_CET_datetime<p>");
print("Pacific (Los Angeles) > $now_hour_from_pacific_datetime<p>");
print("Mountain (Denver) > $now_hour_from_mountain_datetime<p>");
print("Central (Chicago) > $now_hour_from_central_datetime<p>");
print("Eastern (New York) > $now_hour_from_eastern_datetime<p>");
?>
Looks correct to me.
Be sure to test it for 'Asia/Kolkata' too. That's a good test because its time zone offset is on a half-hour.
Ditto for 'America/Phoenix' because they stay on standard time all year.
Usually apps like this ask each user to provide a timezone name during onboarding. (But many users don't bother)
In the US when we want to specify a timezone in a way where it doesn't have to change between summer and winter, we say "Eastern Time", "Central Time", "Mountain Time", "Pacific Time," and Hawaii and Alaska time. The Canadians also have "Atlantic Time" ('America/Halifax'). In Arizona ('America/Phoenix') they say 'Arizona Time'.

Row count for a column

In my subreport I want do display for eg.
Number of clients born in 1972: 34
So in the database I have a list of their birth years
How can I display this number in a field?
Here is a Sample of the data:
<Born> <Name> <BleBle>
1981 Mnr EH Van Niekerk 9517
1982 MEV A BELL 9520
1972 Mnr GI van der Westhuize 9517
1987 Mnr A Juyn 9517
1983 Mev MJC Prinsloo 9513
1972 Mnr WA Van Rensburg 9517
1989 Kmdt EL Van Der Colff 9514
1972 Mnr JS Jansen Van Vuuren 9517
So if this was all the data the output would have to be
Number of clients born in 1972: 3
Create a variable BORN_IN_1972.
Set its "Variable class" to java.lang.Integer.
Set "Calculation" to "Count".
Set "Variable Expression" to $F{Born}.
Set "Initial Value Expression" to 0.
Than add "Summary" band to your report. And put static text "Number of clients born in 1972:" and text field "$V{BORN_IN_1972}" into it.
Assuming birth year is a string:
SELECT COUNT(*)
FROM MyClients
WHERE birth_year = '1972'
And if birth year is being used as an input control:
SELECT COUNT(*)
FROM MyClients
WHERE birth_year = $P{birth_year}
To count non-zero records in jasper use the expression below -
( $F{test} == 0.0 ? null : $F{test} )

Resources