How to configure begin and end date of SPELL in TraMineR - r

In the SPELL format of TraMineR, for a given individual i, should end date at t and start date at t+1 be the same or incremented by 1?
My dataset is built this way:
id | start | end | state
1 | 2/1/12 | 3/6/12 | "a"
1 | 3/6/12 | 1/14/13 | "b"
1 | 1/14/13| 2/2/13 | "c"
Should I add 1 day to each start beginning at row 2?

The seqformat function of TraMineR expects integer values as begin and end arguments. If you provide dates, their corresponding integer values will be considered.
Now, seqformat has an overwrite argument that permits to control the handling of overlaps. By setting overwrite = FALSE, you get the results that you would obtain by adding 1 day to the start values of your second and third rows. With the default overwrite = TRUE, the most recent episode overwrites the older one when they overlap.

Related

Datetime column from Table1 is not matching the DateTime column from Table 2

Hello I have an issue of matching two different datetime columns.
I need to compare the two of them (and their data), but at the moment of putting them in the same table (using a datetime relation) I do not get the match I need:
What I need:
| Datetime_1 | Datetime_2 |
| ---------- | ---------- |
| 01/01/2023 08:00:00 AM | |
... ...
| 01/11/2023 12:00:00 AM | 01/11/2023 12:00:00 AM |
| 01/11/2023 01:00:00 AM | 01/11/2023 01:00:00 AM |
... ...
| 31/01/2023 12:00:00 PM | 31/01/2023 12:00:00 PM |
What I get:
Datetime_1 goes from 01/01/2023 12:00:00AM to 01/31/2023 11:00:00PM (with steps of 1h) and Datetime_2 goes from 01/11/2023 8:15:00 PM to 02/06/2023 7:45:00 PM (with steps of 30min).
I did a relation with the two of them and I didn't receive any error:
I already put both lists in Date/Time format in Power Query and Data panel.
However, I noticed my main datetime list doesn't have the hierarchy icon on fields panel, while the secondary datetime lists have it, (but not the hour section):
Also, as I mentioned before, my list have a range between Jan and Feb. I do not understand why this range continues and match some dates on the on my main datetime list:
Troubleshooting
Part of the difficulty troubleshooting this is the two columns are formatted differently. Just for now, make sure both are formatted as Long Date Time. When comparing the relationship, do not drag the hierarchy (for the one that has it) into the table but rather just the date itself. When you do, you will see the full timestamp for both columns and the issue will become more clear.
Power BI & Relationships on DateTime
Power BI will only match related rows if the date and time match exactly, so 4/15/2023 12:00:00 AM will not match 4/15/2023/12:00:01 AM. You mentioned one side of the relationship has 30 minute steps while the other has 1 hour steps. Power BI is not going to match up a 1:30am and 1:00am value for you. If you want that 1:30 value to match up to 1:00, create another column truncating the :30 minutes and build your relationship on the truncated column.
Time Dimension
I'm not sure of your application so don't know if this will work, but when dealing with time, I try to separate Date and Time into separate columns and have both a Date and Time dimension. Below is my time dimension DAX. You can generate any minute-precise interval with it. Notice the last defined column "timekey". I create a column in my fact table to relate to this key.
DimTime =
var every_n_minutes = 15 /* between 0 and 60; remainders in last hourly slice */
/* DO NOT CHANGE BELOW THIS LINE */
var slice_per_hour = trunc(DIVIDE(60,every_n_minutes),0)
var rtn =
ADDCOLUMNS(
SELECTCOLUMNS(
GENERATESERIES(0, 24*slice_per_hour - 1, 1),
"hour24", TRUNC(DIVIDE([Value],slice_per_hour),0),
"mins", MOD([Value],slice_per_hour) * every_n_minutes
),
"hour12", MOD([hour24] + 11,12) + 1,
"asTime", TIME([hour24],[mins],0),
"timekey", [hour24] * 100 + [mins]
)
return rtn
As requested, turning this into an answer. The reason you're getting these results is that your time stamps will never line up. Yes, it let you create the join, but my guess is that is only because both fields have the same formatting. Also, it is best practices to separate your dates and time in separate date and time dimensions, then join them via a fact table. See also here.

KQL extend to new column with summarize inside

I'm trying to make a table with these columns
type | count
I tried this with no luck
exceptions
| where timestamp > ago(144h)
| extend
type = type, count = summarize count() by type
| limit 100
Any idea on what I'm doing wrong?
You should do this instead:
exceptions
| where timestamp > ago(144h)
| summarize count = count() by type
| limit 100
Explanation:
You should use extend when you want to add new/replace columns to the result, for example, extend day_of_month = dayofmonth(Timestamp) - you'll remain with exactly the same record count in this case - see more info in the doc
You should use summarize when you want to summarize multiple records (so the record count after the summarize will usually be smaller than the original record count), like in your case - see more info in the doc
By the way, instead of 144h you can use 6d, which is exactly the same, but is more natural to the human eye :)

Surv function input - right,left or interval censored? In R

I am at the beginning of setting up a survival analysis in R.
I took a look in this book here: https://www.powells.com/book/modeling-survival-data-9780387987842/ but struggle to properly set the data up in the first place. So this is a very basic question to survival analysis as I can not find a good example online.
I'd to understand how to incorporate the consorized data into my surv() function. I understand the inputs in surv are:
0 = right censored
1 = event
2 = left censored
3 = interval censored
Right Censored: The time of study ends before an event takes place (ob1)
Left Censored: The event has already happend before the study starts
Event: Typically, death or some other form of expected outcome (marked by x)
Intervall Censored: The observation starts at some point in the study and has an event / drops out before end of study (ob5)
Left truncated: Ob 3,4,5 are left truncated
To better understand what I am talking about I sketched the described types of censored data below:
"o" marks beginning of data / first occurance in data set
"x" marks event
Start of study End of observation
ob1 o-|-----------------------------------------------------------|--------
| |
ob2 o-|-------------------------------xo |
| |
ob3 | o-----------------------------------xo |
| |
ob4 | o------------------x-|----------o
| |
ob5 | o----------------------------o |
|--------------------------------------------------------------
1999 2010
Finally, what would i like to know:
Did I classify ob1- ob5 correctly?
How about the other types of observations?
How do I represent these as input for the surv function? If for example right censored is true, i.e. the study ends how does a "0" indicate so? What is the input for the time series when neither event(1) nor end of observation occur (0)? what happens at a time when "nothing" happens?
When and how is the interval censored data marked? 3 for beginning and end?
I can provide some sample code if needed.
Again, thank you for your help on this and valuable questions!

Change column header depending on marked row on Spotfire

I have two cross tables on a single page.
The first cross table is a summary that has Components on the horizontal axis, and Facilities on the vertical axis. The cell values shows colors "RED", "YELLOW", or "NA". The second cross table is a drilldown of the marked row on the summary table, with the horizontal axis Components and Type on the vertical axis. The cell values are a count function.
What I need is to have the color of what I marked show below each component in the drilldown.
Summary
+----------+--------+-------+--------+
| Facility | COMP1 | COMP2 | COMP3 |
+----------+--------+-------+--------+
| FAC1 | NA | RED | RED |
| FAC2 | YELLOW | NA | RED |
| FAC3 | RED | RED | YELLOW |
+----------+--------+-------+--------+
Drilldown (If I mark the FAC2 row)
+-------+--------+-------+
| Type | COMP1 | COMP3 |
+ + YELLOW + RED +
|-------|--------|-------|
| TYPE1 | 12 | |
| TYPE2 | 11 | 4 |
+-------+--------+-------+
Does anyone know if this is possible with cross tables? Any tips on how to do it? I appreciate the help.
Thanks,
John
Edit: I'm doing this to go around not being able to color column headers of a cross table, so if anyone has an alternative, I would appreciate it.
Currently using Spotfire 7.11
Okay. Bear with me here as I have hacked together a solution. I will say, I made some assumptions about your data structure. Depending on the structure of your data, the answer may need slightly modified.
Here is the structure of my data:
Step 1: Create two document properties to hold the values of the title. I created two document properties named "tableTitle1" and "tableTitle2" (one for each column in the details cross table). Create one document property to hold a DateTime value that an r script will pass us (will discuss later). I named mine "time".
Step 2: Create the cross tables as you have them. Ensure the first cross table is using Marking "Marking" and the second is limited by the marking "Marking". In the second cross table, ensure that the titles look something like this: Count([Comp1]) as [Comp1 ${tableTitle1}], Count([Comp3]) as [Comp2 ${tableTitle2}]. You need to use the document properties created in Step 1.
Step 3: Create the python script. The code is as follows:
from System.Collections.Generic import List
from Spotfire.Dxp.Data import *
# Create a cursor for the table column to get the values from.
# Add a reference to the data table in the script.
dataTable = Document.Data.Tables["SOTest"]
cursor = DataValueCursor.CreateFormatted(dataTable.Columns["Comp1"])
# Retrieve the marking selection
markings = Document.Data.Markings["Marking"].GetSelection(dataTable).AsIndexSet()
# Create a List object to store the retrieved data marking selection
markedata = List [str]();
# Iterate through the data table rows to retrieve the marked rows
for row in dataTable.GetRows(markings, cursor):
value = cursor.CurrentValue
if value <> str.Empty:
markedata.Add(value)
# Get only unique values
valData = List [str](set(markedata))
# Store in a document property
Document.Properties["tableTitle1"] = ', '.join(valData)
####DO IT AGAIN FOR THE SECOND COLUMN#####
# Create a cursor for the table column to get the values from.
# Add a reference to the data table in the script.
cursor = DataValueCursor.CreateFormatted(dataTable.Columns["Comp2"])
# Create a List object to store the retrieved data marking selection
markedata = List [str]();
# Iterate through the data table rows to retrieve the marked rows
for row in dataTable.GetRows(markings, cursor):
value = cursor.CurrentValue
if value <> str.Empty:
markedata.Add(value)
# Get only unique values
valData = List [str](set(markedata))
# Store in a document property
Document.Properties["tableTitle2"] = ', '.join(valData)
Step 4: Create an R Script to kick off the python script when data is marked. This is going to be a very simple R Script. The code is as follows:
markedTable <- inputTable
time <- Sys.time()
The check box for allow caching should be unchecked. The output parameter time should go to the document property time. the input parameter inputTable should be your datatable, all columns, and should be limited by Marking. Ensure that the refresh function automatically checkbox is checked.
Step 5: Map the python script to the time document property. In the Edit > Document Properties dialogue box, under Properties, assign the python script we created to the document property. The R script will change the current datetime each time the marking on the table changes, thus running our python script for us.
Step 6: Watch the magic happen.

Renaming a column with value from another column plus sequence

I have a table TBL_CLASS with the structure
CLASS_NAME | CLASS_TYPE
Maths PRI
Math SEC
English PRI
English PAR
Physics PAR
Biology BIO
What I want to do is to update the ClassCode column with the value Maths and a sequence starting from 0001 instead of 1 so that the table above will become
CLASS_NAME | CLASS_TYPE | CLASS_CODE
Maths MATH MATH0001
Math MATH MATH0002
English ENG ENG0001
English ENG ENG0002
Physics PHY PHY0001
Biology BIO BIO0001
Is it possible to do this just in the database without filling the first column with MATH0001 or ENG0001 for each column or creating a temp column?
I could create a temp column with the sequence but I'd still have to fill in the first number into the column and then selecting the MAX
SELECT MAX(substr(TEMP_CODE, -4)) AS LAST4DIGIT FROM TBL_CLASS WHERE CLASS_TYPE= 'MATHS'
And then updating the column with the update statement
UPDATE TBL_CLASS SET CLASS_CODE = CLASS_TYPE||TEMP_COL
What I'd like to achieve is to update the ClassCode with a statement like
UPDATE TBL_CLASS SET CLASS_CODE = CLASS_NAME|| <sequence> without filling in the first value (0001) for each class and adding sequence into the temp column.
Appreciate any pointers on how I could approach this.
This SQL shows how you can manipulate with the column output. You can use this to update your table if that's what you want. If you need to add a new (eg. Math) row and need to find the indexnumber you can use eg. REGEXP_SUBST to extract only numbers and then remove the front zeros (and add 1).
SELECT class_name,
SUBSTR (UPPER (class_name),
1,
CASE WHEN UPPER (class_name) LIKE 'MATH%' THEN 4 ELSE 3 END)
class_type,
SUBSTR (
UPPER (class_name),
1,
CASE WHEN UPPER (class_name) LIKE 'MATH%' THEN 4 ELSE 3 END)
|| LPAD (
ROW_NUMBER () OVER (PARTITION BY class_type ORDER BY class_type),
4,
'0')
class_code
FROM tbl_class
EDIT: I didn't realize that the WITH clause could cause misunderstanding, so i have deleted it.

Resources