How can I Achieve this table.
From:
TIME
DETAILS1
DETAILS2
DETAILS3
8:00
test a
test a1
test a2
8:00
test b
test b1
test b2
8:00
test c
text c1
test c2
To:
TIME
DETAILS1
DETAILS2
DETAILS3
8:00
test a
test b
test c
test a1
test b1
test c1
test a2
text b2
test c2
Since you're using Microsoft Access, it's easier to solve this with SQL, and use this query as the datasource for your DataTable or similar class in ASP.NET later.
You also didn't provide your database structure, so I considered it being a single table named TableTime with columns ID, Time, Details1, Details2, and Details3.
This solution basically creates a ranked column (Rank) 'grouped' by the Time. So for each different Time, you'll get the related rows with 1, 2, 3, and so on. And the row with Rank equals 1 is the one you want to display the corresponding time. Every row with Rank greater than 1 keeps the Time empty, creating your desired effect.
You'll probably need to adapt this for your table and column names, but if you need help with that please share your table structure.
SELECT
IIf([Rank]=1,[TimeTable].[Time],"") AS MyTime,
TimeTable.Details1,
TimeTable.Details2,
TimeTable.Details3,
DCount("ID","TimeTable","Time = #" & [TimeTable].[Time] & "# AND ID <=" & [TimeTable].[ID]) AS Rank
FROM TimeTable;
Related
I am trying to merge two dataframes but not all records have the primary key.
DF1 is:
EmpID SNcode Name
A1 123 Bill
B2 456 Alice
Carrie
DF1
DF2 is:
EmpID Sncode Name Department
A1 123 Bill Accounts
B2 456 Alice CustService
986 Peter
DF2
I want the result to be like this:
EmpID SNcode Name Department
A1 123 Bill Accounts
B2 456 Alice CustService
Carrie
986 Peter
Result
My code below doesn't work:
mydata <- merge(DF1, DF2, by="EmpID",all.y=TRUE)
Could you please help me how to fix this? I need Carrie and Peter appear in the results.
These are the important basic code functions that are important to learn and understand for coding in R. There are three different functions used here that will get the code to output exactly the way you want.
First, know that the tables are named in the same order as you showed them. They are named here as, t1, t2, and t3. With t3 being the desired output. These tables t1 and t2 do not get changed and remain the same.
There are four types of joins. You might read-up about these join types and then try practicing with them. Here I used the full_join() function. This joins the values from two of the objects ( t2 & t1 ) by their column names. All values are joined together and nothing gets removed or left behind. The tables, t1 and t2 remain unchanged after this join.
Then the function, na.omit() is used to remove the NA's from the joined data. You can see these NA's if you print t3 before running the na.omit() function.
Then the function, rownames() is used to remove the row numbers from the joined data. The data in t3 is now cleaned-up and matches your desired output format.
The code and output follows:
t3 <- full_join(t2, t1) # join
t3 <- na.omit(t3) # remove the NA's
rownames(t3) = NULL # remove the column row numbers
t3 # print the output
I have two tables: One with participants and one with an encoding of scores based on birth dates.
The score table looks like this:
score_table
Key | Value
--------------------
01/01/1900 | 15
01/01/1940 | 25
01/01/1950 | 30
All participants with birth dates between 01/01/1900 and 01/01/1940 should get a score of 15.
Participants born between 01/01/1940 and 01/01/1950 should get a score of 25, etc.
My participants' table looks like this:
participant_table
BirthDate | Gender
-----------------------
05/05/1930 | M
02/07/1954 | V
01/11/1941 | U
I would like to add a score to get the output table:
BirthDate | Gender | Score
------------------------------------
05/05/1930 | M | 15
02/07/1954 | V | 30
01/11/1941 | U | 25
I built several solutions for similar problems when the exact values are in the score table (using dplyr::left_join or base::match) or for numbers which can be rounded to another value. Here, the intervals are irregular and the dates arbitrary.
I know I can build a solution by iterating through the score table, using this method:
as.Date("05/05/1930", format="%d/%m/%Y) < as.Date("01/01/1900", format="%d/%m/%Y)
Which returns a Boolean and thus allows me to walk through the scores until I find a date which is bigger and then use the last score. However, there must be a better way to do this.
Maybe I can create some sort of bins from the dataframe, as such:
Bin 1 | Bin 2 | Bin 3
Date 1 : Date 2 | Date 2 : Date 3 | Date 3 : inf
But I don't yet see how. Does anyone see an efficient way to create such bins from a dataframe, so that I can efficiently retrieve scores from this table?
MRE:
Score table:
structure(list(key=c("1/1/1900", "2/1/2013", "2/1/2014","2/1/2015", "4/1/2016", "4/1/2017"), value=c(65,65,67,67,67,68)), row.names=1:6, class="data.frame")
Participant File:
structure(list(birthDate=c("10/10/1968", "6/5/2015","10/10/2017"), Gender=c("M", "U", "F")), row.names=1:3, class="data.frame")
Goal File:
structure(list(birthDate=c("10/10/1968", "6/5/2015","10/10/2017"), Gender=c("M", "U", "F"), Score = c(65,67,68)), row.names=1:3, class="data.frame")
Here is an approach using lag() along with sqldf:
score_table$Key2 <- as.Date(lead(score_table$Key), format="%d/%m/%Y")
score_table$Key <- as.Date(score_table$Key, format="%d/%m/%Y")
names(score_table) <- c("Date1", "Value", "Date2")
participant_table$BirthDate <- as.Date(participant_table$BirthDate, format="%d/%m/%Y")
sql <- "SELECT p.BirthDate, p.Gender, s.Value AS Score
FROM participant_table p
INNER JOIN score_table s
ON (p.BirthDate >= s.Date1 OR s.Date1 IS NULL) AND
(p.BirthDate < s.Date2 OR s.Date2 IS NULL)"
participant_table <- sqldf(sql)
The logic here is to join the participant to the score table using a range of matching dates in the latter. For the edge cases of the first and last rows of the score table, we allow a missing date in either column to represent any date whatsoever. For example, in the last row of the score table, the only requirement for a match is that a date be greater than the lower portion of the range.
I actually do not have R running locally at the moment, but here is a demo link to SQLite showing that the SQL logic works correctly:
Demo
I have found a very simple solution using only arithmetic.
In order to retrieve a score, I check how many numbers are superseded by the input date:
rownum <- sum(as.Date(input_date, format="%d/%m/%Y") >
as.Date(score_table$Key, format="%d/%m/%Y"))
Then, the corresponding key can be found using:
score <- score_table[["Value"]][rownum]
Thus, the spacing of the dates becomes irrelevant and it works quite fast.
I thought I'd share my solution in case it might be of use.
Thanks everyone for the effort and responses!
I have a scenario where i have to correct the history data. The current data is like below:
Status_cd event_id phase_cd start_dt end_dt
110 23456 30 1/1/2017 ?
110 23456 31 1/2/2017 ?
Status_cd event_id phase_cd start_dt end_dt
110 23456 30 1/1/2017 ?
111 23456 30 1/2/2017 ?
The major columns are status_cd and phase_cd. So, if any one of them change the history should be handled with the start dt of the next record as the end date of the previous record.
Here both the records are open which is not correct.
Please suggest on how to handle both the scenarios.
Thanks.
How are your history rows ordered in the table? In other words, how do you decide which history rows to compare to see if a value was changed? And how do you uniquely identify a history row entry?
If you order your history rows by start_dt, for example, you can compare the previous and current row values using window functions, like Rob suggested:
UPDATE MyHistoryTable
FROM (
-- Get source history rows that need to be updated
SELECT
history_row_id, -- Change this field to match your table
MAX(status_cd) OVER(ORDER BY start_dt ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING) AS status_cd_next, -- Get "status_cd" value for "next" history row
MAX(phase_cd) OVER(ORDER BY start_dt ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING) AS phase_cd_next,
MAX(start_dt) OVER(ORDER BY start_dt ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING) AS start_dt_next
FROM MyHistoryTable
WHERE status_cd <> status_cd_next -- Check "status_cd" values are different
OR phase_cd <> phase_cd_next -- Check "phase_cd" values are different
) src
SET MyHistoryTable.end_dt = src.start_dt_next -- Update "end_dt" value of current history row to be "start_dt" value of next history row
WHERE MyHistoryTable.history_row_id = src.history_row_id -- Match source rows to target rows
This assumes you have a column to uniquely identify each history row, called "history_row_id". Give it a try and let me know.
I don't have a TD system to test on, so you may need to futz with the table aliases too. You'll also probably need to handle the edge cases (i.e. first/last rows in the table).
Hi now i'm studying association rules with R.
i have a question.
in transcation data,
we consider just buy or non-buy (binary data)
i want to know how to perform association rules with count data
ex)
item1 item2 item3
1 2 0 1
2 0 1 0
3 1 0 0
first customer bought two item1s!!
but in ordinary association rules, that count information is ignored
how can we consider that information?
High, The quantitative association rules (QAR) mining may be helpful.
Firstly, you should divide the value field of every item to some sets and give every set an unique label. Then, the original dataset can be transformed to a binary dataset containing those labels.
for example, for item1, if the original data has the following information:
the first person have bought 5 item1s
the second one have bought 2 item1s
the third one have bought 7 item1s.
You can divide the value field of item1 to [0, 3), [3, 6) and [6, 9), and use a1, a2 and a3 to represent them, so the item 'item1' can be replaced by 3 other items, which are a1, a2 and a3, and the original data can be replaced by the follows.
the first person have bought one a2.
the second person have bought one a1.
the third person have bought one a3.
After doing this work on every item, the original dataset can be transformed to a binary dataset.
I have a report where I want to display data obtained by a stored procedure which returns single table.
Columns returned from stored procedure
Client #, Client Name, Client Phone #
Project #, Project Description
Test #, Test Name, Test Description
Container #, Container Description, Container Parameter
Relationship between tables
Client - Project 1-N(one to many)
Project - Test 1-N(one to many)
Test - Container 1-N(one to many)
Desired format to be displayed
Client # : XYZ Client Name : Patterson, Celeste Phone # (XXX) XXX-XXXX
Project #: P1 Project Description: Project 1 Data
Test # Test Name Test Description Container # Container Description Container Parameter
T1 Test 1 Test Data C1 Container Data C1 Parameter
T1 Test 1 Test Data C2 Container 2 Data C2 Parameter
T2 Test 2 Test 2 Data C1 Container 1 Data C1 Parameter
T2 Test 2 Test 2 Data C3 Container 3 Data C3 Parameter
Project #: P2 Project Description: Project 2 Data
Test # Test Name Test Description Container # Container Description Container Parameter
T1 Test 1 Test Data C1 Container Data C1 Parameter
T1 Test 1 Test Data C2 Container 2 Data C2 Parameter
T2 Test 2 Test 2 Data C1 Container 1 Data C1 Parameter
T2 Test 2 Test 2 Data C3 Container 3 Data C3 Parameter
You Can Add 3 groups.
1 Group:Client
In this group u can add client description etc.
In the Same way other 2 groups
2 Group:Project
3 Group:Test
This Will Display the Data the format you Needed.