Grouping multiple dates from json in klipfolio - klipfolio

I have a JSON formatted input in the below form:
[
{
"hash":"abcdefg",
"Stage 1 Status":"Complete",
"Stage 1 Completion":"2021-01-16T19:56:10+02:00",
"Stage 2 Status":"Complete",
"Stage 2 Completion":"2021-02-17T16:30:30+03:00",
"Stage 3 Status":"Complete",
"Stage 3 Completion":"2021-03-17T16:30:34+03:00"
},
{
"hash":"klmnop",
"Stage 1 Status":"Complete",
"Stage 1 Completion":"2021-01-16T19:56:10+02:00",
"Stage 2 Status":"Open",
"Stage 2 Completion":"2021-02-17T16:30:34+03:00"
},
{
"hash":"jklmn",
"Stage 1 Status":"Complete",
"Stage 1 Completion":"2021-01-16T19:56:10+02:00",
"Stage 2 Status":"Lost",
"Stage 2 Completion":"2021-07-17T16:30:30+03:00"
}
]
And I want to make a klip in klifolio, Showing me the Completed stages of each month like the below output derived from the above data.
January
February
March
Stage1
3
0
0
Stage2
0
1
0
Stage3
0
0
1
The data is of three years and I need to show the years counts in separate tables.
New to klipfolio so till now I have constructed arrays for table column titles and row tags, but I am having difficulty with the query.

To accomplish this in Klipfolio, you'll need to first organize your JSON data into columnar structures so you can perform queries on it. To do this you need one set of data for the status, completion and stage name.
For the stage name, you can utilize kf:names and the LEFT() function. This will look like:
LEFT(#kf:names(/,FALSE),7)
By default, kf:names returns data in alphabetical order, and the FALSE parameter return the data in the order from top to bottom of the JSON structure.
LEFT() will return the 7 leftmost characters, which will only return "Stage 1" instead of "Stage 1 Status"
To return the date, the wildcard selector is needed as well as the contains() function to return any values for the fields that have the name the contains "Completion".
#/*[contains(name(),'Completion')]
Finally, to return the status you'll need to do the same as above, but search for field names that contain "status".
#/*[contains(name(),'Status')]
From here you should have 3 arrays of data you can query upon. The first query which returns all field names in your JSON structure will first need to be filtered to return the same amount of items as the dates and status arrays. You can use SELECT() to only return fields with 'Completion' or 'Status' as we know there is a one-to-one relationship between these fields and the values.
SELECT(LEFT(#kf:names(/,FALSE),7),CONTAINS(#kf:names(/,FALSE),"Status"))
This will return 7 items to match the 7 items of dates and statuses.
Now that the data is in the proper shape, a LOOKUP() can be used in each column to align the count of completed records per stage, by using the SELECT(), AND(), GROUP() and COUNTDISTINCT() functions.
First, results reference the column with the stage names, in the first parameter of LOOKUP(). Then use a SELECT() to filter for only the completed stages which fall in a specific month. For the dates, you'll need to convert to unix time with DATE() and then DATEVALUE() to convert to something that is rolled up to a month like "yyyyMM" format which would return "202101" if the date is in January of this year.
The SELECT() looks like this:
SELECT(SELECT(LEFT(#kf:names(/,FALSE),7),CONTAINS(#kf:names(/,FALSE),"Status")),AND(DATEVALUE(DATE(#/*[contains(name(),'Completion')],"yyyy-MM-dd"),"yyyyMM")="202101",#/*[contains(name(),'Status')]="Complete"))
In the second parameter wrap a GROUP() around the SELECT() to group like values and in the third parameter of the LOOKUP() wrap the SELECT() with a COUNTDISTINCT() to count the items per group. The whole formula for the January 2021 column would look like so:
LOOKUP(&Stages,GROUP(SELECT( SELECT(LEFT(#kf:names(/,FALSE),7),CONTAINS(#kf:names(/,FALSE),"Status")), AND(DATEVALUE(DATE(#/*[contains(name(),'Completion')],"yyyy-MM-dd"),"yyyyMM")="202101", #/*[contains(name(),'Status')]="Complete"))), COUNTDISTINCT(SELECT( SELECT(LEFT(#kf:names(/,FALSE),7),CONTAINS(#kf:names(/,FALSE),"Status")), AND(DATEVALUE(DATE(#/*[contains(name(),'Completion')],"yyyy-MM-dd"),"yyyyMM")="202101", #/*[contains(name(),'Status')]="Complete"))))
From there, you can change the 202101 in the formula to any yyyyMM to return a particular month data, IE. for the next column, February it would be 202102.

Related

I am making a for/if loop and I am missing a step somewhere and I cant figure it out

strong text Below is my objective and the code I made to represent that Row 19 is the original street text and 24 is where street2 is located
https://www.opendataphilly.org/dataset/shooting-victims/resource/a6240077-cbc7-46fb-b554-39417be606ee << where the .csv is
Let's deal with the streets with '&' separating their names. Create a new column named street2 and set it equal to NA.
Then, iterate over the data frame using a for loop, testing if the street variable you created earlier contains an NA value.
In cases where this occurs, separate the names in block according to the & delimiter into the fields street and street2 accordingly.
Output the first 5 lines of the data frame to the screen.
Hint: for; if; :; nrow(); is.na(); strsplit(); unlist().
NewLocation$street2 <- 'NA'
Task7 <- unlist(NewLocation)
for (col in seq (1:dim(NewLocation)[19])) {
if (Task7[street2]=='NA'){
for row in seq (1:dim(NewLocation[24])){
NewLocation[row,col] <-strsplit(street,"&",(NewLocation[row,col]))
}
}
}

Working on loop and wanting some feedback, re-adding this to update code and list .csv

Acses to
https://www.opendataphilly.org/dataset/shooting-victims/resource/a6240077-cbc7-46fb-b554 39417be606ee
I have gotten close and got my loop to run, but not gotten the output I want
want a split of street # any '&' locations to a col called 'street$2
**Main objective explained et's deal with the streets with & separating their names. Create a new column named street2 and set it equal to NA.
Then, iterate over the data frame using a for loop, testing if the street variable you created earlier contains an NA value.
In cases where this occurs, separate the names in block according to the & delimiter into the fields street and street2 accordingly.
Output the first 5 lines of the data frame to the screen.
Hint: mutate(); for; if; :; nrow(); is.na(); strsplit(); unlist().
library('readr')
NewLocation$street2 <- 'NA'
#head(NewLocation)
Task7 <- unlist(NewLocation$street2)
for (row in seq(from=1,to=nrow(NewLocation))){
if (is.na(Task7[NewLocation$street])){
NewLocation$street2 <-strsplit(NewLocation$street,"&",(NewLocation[row]))
}
}
This is changing all on my street2 to equal street 1 and get rid of my "NA"s

R - Filter any rows and show all columns

I would like to an output that shows the column names that has rows containing a string value. Assume the following...
Animals Sex
I like Dogs Male
I like Cats Male
I like Dogs Female
I like Dogs Female
Data Missing Male
Data Missing Male
I found an SO tread here, David Arenburg provided answer which works very well but I was wondering if it is possible to get an output that doesn't show all the rows. So If I want to find a string "Data Missing" the output I would like to see is...
Animals
Data Missing
or
Animal
TRUE
instead of
Anmials Sex
Data Missing Male
Data Missing Male
I have also found using filters such as df$columnName works but I have big file and a number of large quantity of column names, typing column names would be tedious. Assume string "Data Missing" is also in other columns and there could be different type of strings. So that is why I like David Arenburg's answer, so bear in mind I don't have two columns, as sample given above.
Cheers
One thing you could do is grep for "Data Missing" like this:
x <- apply(data, 2, grep, pattern = "Data Missing")
lapply(x, length) > 1
This will give you the:
Animal
TRUE
result you're after. It's also good because it checks all columns, which you mentioned was something you wanted.
If we want only the first row where it matches, use match
data[match("Data Missing", data$Animals), "Animals", drop = FALSE]
# Animals
#5 Data Missing

Modify string names in a data frame based on a condition

I have a data frame with a variable called "Control_Category". The variable has six names in it, which for simplicity sake I am going to make generic:
df <- data.frame(Control_Category = c("Really Long Name One",
"Super Really Long Name Two",
"Another Really Flippin' Long Name Three",
",Seriously, It's a Fourth Long Name",
"Definitely a Fifth Long Name",
"Finally, This guy is done, number six"))
I'm using this to make a slight joke. So, while the names are long they are tidy in that the values for each (1-6) are consistent. In this specific character vector of the data.frame, there are hundreds and hundreds of entries that match any one of those six.
What I need to do is to replace the long names with a short name. Therefore, where any of the above names are identified, replace that name with a shorter version, like:
One
Two
Three
Four
Five
Six
I tried a function using 'case_when' and it failed miserably. Any help would be appreciated.
Additional Information Based on Questions From Community
The order of the items doesn't matter. There isn't a designation of 1 - 6. There just happen to be six and I made six stupid long strings. The strings themselves are long.
So, anywhere "Super Really Long Name Two" exists, that value needs to be updated to something like 'TWO" or a "Short_Name" that that approximate "TWO". In reality, the category is called "Audit, Testing and Examination Results". The short name would ideally just be "AUDIT".
You could just use gsub() once for each replacement:
df$Control_Category <- gsub('Really Long Name One', 'One', df$Control_Category)
You can repeat similar logic to handle the other five long/short name pairs.
Here's a larger data frame with long names:
set.seed(101)
long_names <- c("Really Long Name One",
"Super Really Long Name Two",
"Another Really Flippin' Long Name Three",
",Seriously, It's a Fourth Long Name",
"Definitely a Fifth Long Name",
"Finally, This guy is done, number six")
df <- data.frame(control_category=sample(long_names, 100, replace=TRUE))
head(df)
## control_category
## 1 Another Really Flippin' Long Name Three
## 2 Really Long Name One
## 3 Definitely a Fifth Long Name
## 4 ,Seriously, It's a Fourth Long Name
## 5 Super Really Long Name Two
## 6 Super Really Long Name Two
Using the unique function will give you the category names:
category <- unique(df$control_category)
print(category)
## [1] Another Really Flippin' Long Name Three
## [2] Really Long Name One
## [3] Definitely a Fifth Long Name
## [4] ,Seriously, It's a Fourth Long Name
## [5] Super Really Long Name Two
## [6] Finally, This guy is done, number six
## 6 Levels: ,Seriously, It's a Fourth Long Name ...
Notice that the levels are in alphabetical order (see levels(category)). In this case, the simplest way is to change the order manually by looking at the current order. In this case, category[c(2, 5, 1, 4, 3, 6)] will give you the right order. Finally,
df$control_category <- factor(
df$control_category,
levels=category[c(2, 5, 1, 4, 3, 6)],
labels=c("one", "two", "three", "four", "five", "six")
)
head(df)
## control_category
## 1 three
## 2 one
## 3 five
## 4 four
## 5 two
## 6 two

Selecting strings and using in logical expressions to create new variable - R

I have a categorical variable indicating location of flu clinics as well as an "other" category. Participants who select the "other" category give open-ended responses for their location. In most cases, these open-ended responses fit with one of the existing categories (for example, one category is "public health clinic", but some respondents picked "other" and cited "mall" which was a public health clinic). I could easily do this by hand but want to learn the code to select "mall" strings then use logical expressions to assign these people to "public health clinic" (e.g. create a new variable for location of flu clinics).
My categorical variable is "lrecflu2" and my character string variable is "lfother"
So far I have:
mall <- grep("MALL", Motiv82012$lfother, value = TRUE)
This gives me a vector with all the string responses containing "MALL" (all strings are in caps in the dataframe)
How do I use this vector in a logical expression to create a new variable that assigns these people to the "public health clinic" category and assigns the original value of flu clinic location variable for people that did not select "other" (and do not have values in the character string variable) to the new flu clinic location variable?
Perhaps, grep is not even the right function to be using.
As I understand it, you have a column in a data frame, where you want to reassign one character value to another. If so, you were almost there...
set.seed(1) # for generating an example
df1 <- data.frame(flu2=sample(c("MALL","other","PHC"),size=10,replace=TRUE))
df1$flu2[grep("MALL",df1$flu2)] <- "PHC"
Here grep() is giving you the required vector index; you then subset the vector based on this and change those elements.
Update 2
This should produce a data.frame similar to the one you are using:
set.seed(1)
lreflu2 <- sample(c("PHC","Med","Work","other"),size=10,replace=TRUE)
Ifother <- rep("",10) # blank character vector
s1 <- c("Frontenac Mall","Kingston Mall","notMALL")
Ifother[lreflu2=="other"] <- s1
df1 <- data.frame(lreflu2,Ifother)
### alternative:
### df1 <- data.frame(lreflu2,Ifother, stringsAsFactors = FALSE)
df1
gives:
lreflu2 Ifother
1 Med
2 Med
3 Work
4 other Frontenac Mall
5 PHC
6 other Kingston Mall
7 other notMALL
8 Work
9 Work
10 PHC
If you're looking for an exact string match you don't need grep at all:
df1$lreflu2[df1$Ifother=="MALL"] <- "PHC"
Using a regex:
df1$lreflu2[grep("Mall",df1$Ifother)] <- "PHC"
gives:
lreflu2 Ifother
1 Med
2 Med
3 Work
4 PHC Frontenac Mall
5 PHC
6 PHC Kingston Mall
7 other notMALL
8 Work
9 Work
10 PHC
Whether Ifother is a factor or vector with mode character doesn't affect things. data.frame will coerce string vectors to factors by default.

Resources