Assign to new ID based multiple columns (in R) - r

This is a bit of a weird one. I've got a dataset with thousands of rows. I can't share it. The headers include:
year
reporter ID
reporter name
building ID
building name
controller ID
controller name
latitude (for some)
longitude (for some)
other columns not used for identification
The buildings are my unit of analysis. However, there are problems.
First, the data was entered manually, so name things that should be constant change year to year (eg "Business Inc." and then "Business Incorporated"). And worse, the building ID changes when the controller or the reporter change (which happens when buildings get sold).
For example, in 2015 a building might be called "Big Building", have a building id of "1111" and have a controller called "Tiny Tim". Then in 2016 it is sold, now it appears as "The Big Building", the building id is "4567", and the controller is "Tiny Tim". It is the same building, but this dataset doesn't track that. This is my problem.
What I want: to create a new ID column that actually identifies buildings, and does not change across years or reporters or controllers. This ID could then be used in conjunction with year to lookup the reporter and controller (if desired).
But, I don't know how to do this. I figure there must be something that can look at 'building name' (noting it can have minor changes year on year) as well as the other supporting ID columns and estimate whether to assign a new ID in a new column, or if an ID has already been assigned to this building and then apply that one. Does that make sense?
Can someone please point me in the right direction to get started?

Related

HERE Navstreets LINK_ID Range

My organization has acquired the HERE Navstreets data set. It wishes to update the content while still adhering to the HERE Navstreets data model and relationships.
In this context, it is deemed of value to:
Retain the LINK_ID column as the unique identifier for each street segment.
Make a distinction between the original HERE LINK_ID values and the one added by my organization.
Retain the ability to ingest streets updates from HERE should my organization decide to do so.
In this context, we would like to use a different range of LINK_ID values from the one used by HERE. As an example, if HERE uses values between 10,000,000 and 100,000,000, we would assign only LINK_ID values that are within the range 1,000-9,999,999 (this is only for illustration purposes).
Is this approach already accounted for by HERE for the street data they may get from Map Creator? Is there a specific HERE (for Review or Work in Progress) range of LINK_ID values we should consider?
Based on the HERE KB0015682: Permanent ID concept in HERE Data
Entities with Permanent IDs
Generally, the following feature do have permanent IDs in the HERE Map products:
Lane
Face
Point Features
Administrative Areas (for , Built-up Areas, Districts, and Administrative Areas)
Complex Features (this includes Complex Administrative Area Features as well as Complex Intersections and Complex Roads)
Permanent IDs are globally unique within a specific Object, e.g., a Link ID occurs once globally. However the same Permanent ID can be used among different Object types (e.g., Node, Link, condition, etc.). Note: When a map is upgraded to Intermediate or HERE map, or when a country undergoes administrative restructuring, there may be a change in Permanent IDs.
The following are examples of permanent IDs in the RDF:
Address Point ID
Admin Place ID
Association ID
Building ID
Carto ID
Complex Feature ID
Condition ID
Country ID (which is one of the Admin Place IDs)
Face ID
Feature Point ID
Lane ID
Lane Nav Strand ID
Link ID
Name ID (with some exceptions)
Nav Strand ID
Node ID
POI ID
Road Link ID
Sign ID
Zone ID
Numeric Range of Permanent IDs
Map object IDs (PVIDs) in the extracts use 32-bit integer values to fit in a N(10) scheme. Note: Exception to N(10) scheme can exist. For example, Lane ID is N(12) in length.
The entire range is divided as follows:
Range----------------------------Designation
0000000001 to 0016777215 -> Non-permanent IDs
0016777215 to 2147483647 -> Permanent IDs
The range dedicated to permanent IDs are used for any entity.
The range dedicated to non permanent IDs are used in rare situations where an update is made in a copy of the database instead of in the live database itself and this update results in a new ID. This new ID in the database copy would be in the non-permanent range. The update would also be applied into the live database and this update would receive a permanent ID available in the next scheduled release. A cross-reference is not provided between non-permanent IDs and the eventual permanent ID from the live database.

How to determine trial booking conversion - google sheets

Introduction
Many sites use WooCommerce as a plugin for their WordPress site and so do we :). We've linked all purchases to google sheets, so I can do some analyses.
Our goal is get a many children physically active as we can and we have gym classes for very young children with they parents. To teach them the basics of the fun of physical activity
What I would like to do
I would like to know, how many free trial classes actually convert to paying customers and what the average timespan is between booking a trial class and becoming a member
The data that I have
I have the following columns which are necessary for this, I believe:
Datestamp
paymentID (is empty when booking a free trial class)
Price (is 0,00 whem booking a free trial class)
childsName (Unique in combination with parentsEmailadress, but recurs every month in the list once a membership is active)
parentsEmailadress (may have several children)
OrderName (has the string "trial" or "Membership")
I've made some dummy data in the following sheet:
https://docs.google.com/spreadsheets/d/1lWzQbXMU4qRLp_2qiQ_qsq57nPMy2RG8AHDMGKW626E/edit?usp=sharing
Possible solution
My guess is that I should:
make a column in which I combine the childs name and the emailadress
Make a TRUE of FALSE column to check if order is trial class or not
Make a column to find the first Unique child-emailadres combination in previous orders (How do I do that?! - Vlookup?)
and than
if this is found than check again if this is a trial class order.
If it is a trial class order than it should determine the amount of days between the trial class order and the non-trial class order and display the amount of days
if this is another normal order than leave empty(it's just a membership order)
if the emailadres is not found than display "direct" (it's a directly bought membership)
I did 1 and 2 and tried 3 with:
=ALS(H2=0;VERT.ZOEKEN(G2;A:G;1;ONWAAR);) (in Dutch)
=IF(H2=0;V.LOOKUP(G2;A:G;1;FALSE);) (possible translation)
But this doesn't work.
Really hope some can point me in the right direction!
Thank you very much in advance!
Given that trial classes have a price of 0, there's no need to create another column to identify those cases–just check the price. To the source data we'll add the "Client ID" column that you created in Column G of your sample sheet. (Ideally, you'll come up with an client ID system, but this works.) Now, create a new sheet that will be your dashboard and let's add a few columns:
Client ID This grabs only the unique values from the Client IDs in Sheet2 as we don't want users to be repeated in our dashboard. (Column A, row 1... for all others place the formulas in row 2).
=UNIQUE(Sheet2!G:G)
Did they trial? This will tell you TRUE/FALSE if the client did a trial. (Column B).
=ISNUMBER(MATCH(Dashboard!A2, FILTER(Sheet2!G:G, Sheet2!C:C=0), 0))
Did they convert? This will tell you TRUE/FALSE if the client converted from trial to paid. (In cases where they did not do a trial, it will be blank.) (Column C).
=IF(Dashboard!B2, ISNUMBER(MATCH(Dashboard!A2, FILTER(Sheet2!$G:$G, Sheet2!$C:$C>0), 0)), "")
Date of First Trial The date of the first trial. If none exists, will be blank. (Column D).
=IFERROR(MIN(FILTER(Sheet2!$A:$A, Sheet2!$G:$G=Dashboard!A2, Sheet2!$C:$C=0)))
Date of First Paid Course The date of the first paid course. If none exists, will be blank. (Column E).
=IFERROR(MIN(FILTER(Sheet2!$A:$A, Sheet2!$G:$G=Dashboard!A2, Sheet2!$C:$C>0)))
Days from First Trial to First Paid The number of days between the first paid course and the first trial course. If one of those values doesn't exist, then will be blank. (Column F).
=IF(ISDATE(Dashboard!E2), Dashboard!E2-Dashboard!D2, "")
Now you can answer several questions:
How many clients used a free trial? =COUNTIF(Dashboard!B:B, TRUE)
How many free trials converted? =COUNTIF(Dashboard!C:C, TRUE)
Average number of days from first trial to first paid? =AVERAGE(Dashboard!F:F)

Where to keep related patient information in fhir

I am trying to convert a patient to fhir compliant patient. One of the attributes in our patient structure is 'related patient'. This attribute lists down all other patients who are related to the given patient.
Example , patient p1 is a father of patient p2. So, p1 has a attribute 'related patients' and the value of that attribute is a list containing p2.
Where should I keep this 'related patient' information in fhir object model ?
The FamilyMemberHistory resource has an extension (http://www.hl7.org/fhir/extension-familymemberhistory-patient-record.html) that lets you link that particular family member to the corresponding Patient record for that person. At the moment, there's no way to do that directly from Patient, though you could define an extension that would do that if you really needed to.

MS Access 2010, Distinct in a Report?

I have a report that is based on a query of diagnoses (for example diabetes). The report returns a list of patients with that diagnosis. The problem is if john Q has diabetes xyz and diabetes 123, so if I run the report to give me a list of everyone with diabetes it will retun his name twice. I really don't want to change the particular query that this based on, I just want distinct name in the report. Is there a way to use distinct for a report or any other way of limiting the names to just once in the report? Or am I going to have to write a distinct query just for this report?
either edit the query by changing SELECT to
SELECT DISTINCT
or set the report to use grouping, you can Group by Name (or more likely, by a patient ID so two people with the same name don't get combined) - it's as described here
if you exclude listing the types of diabetes you will get each person on one line, if you include the types of diabetes you will get each person's details used as a heading with the types of diabetes indented
If the report already exists you can edit it by using the Grouping and Sorting option https://support.office.com/en-za/article/Create-a-grouped-or-summary-report-6a58e9ab-9f74-4689-83b6-c63cddb2c7f9?ui=en-US&rs=en-ZA&ad=ZA#__migbm_0

Enter data in mother table using data from child tables

Hi all,
I have 3 tables in an access 2010 database:
Crew: CrewID; Name; Adres;...
Voyage: VoyageId; Voyage name; Departure harbour; Arrival harbour
Crewlist: CrewlistId, VoaygeId, CrewId, Rank
The VoaygeId and CrewId from the Crewlist table are linked (relation) to the autonumber ID's from tables 2 and 1.
My first and main question is: Upon boarding everyone has to ‘sign in’ selecting the voyage and there name, and assign them a roll (of to be donde by the responsible officer). How can I make a form that lets the users browse through the voyagenames and crewnames in stead of the ID’s uses in the ‘mother’ table (table 3: Crewlist)
2nd question: how can I make sure that someone isn’t enrolled twice for the same voyage (adding same voyagenumber and same crewId number in crewlist). This would preferably be blocked upon trying to add the same person a second time on a voyage.
To prevent duplicates in Crewlist, add a unique index to the table on both CrewId and VoyageId
It would be a good idea to add relationships and enforce referential integrity
You are now in a position to use the wizards to create a form based on Voyage and a subform based on CrewList with a combobox based on Crew
There are a number of refinements you could add.
Make sure you do not use reserved words like Name and do not put spaces in field names. You will thank yourself later.
See also create form to add records in multiple tables

Resources