extract every element that meets a pattern in a string in R

extract every element that meets a pattern in a string in R - r

I have a string, basically it's a SQL statement. I want to extract some part of it.
Here is the code
SELECT
DTE as "Date",
CURRENT_DATE AS "Day",
concat( BCCO, BCBCH ) AS "client/batch",
BCSTAT as "Batch Status",
CASE
WHEN EXC = 'MCR' THEN CNT
ELSE 0
END AS "MCR-NPR",
CASE
WHEN EXC = 'NRC' THEN CNT
ELSE 0
END AS "NRC-NPR",
CASE
WHEN EXC = 'OFD' THEN CNT
ELSE 0
END AS "OFD-NPR",
CASE
WHEN EXC = 'TDB' THEN CNT
ELSE 0
END AS "TDB-NPR",
CASE
WHEN EXC = 'TDC' THEN CNT
ELSE 0
END AS "TDC-NPR",
CASE
WHEN EXC = 'UDC' THEN CNT
ELSE 0
END AS "UDC-NPR",
CASE
WHEN EXC = 'BIN' THEN CNT
ELSE 0
END AS "BIN-WRN",
CASE
WHEN EXC = 'DSP' THEN CNT
ELSE 0
END AS "DSP-WRN",
I want to extract every element between END AS and the quote. A vector like ("MCR-NPR",...,"DSP-WRN") will be the desire output.
I know I may need to use regular expression, but I couldn't extract every one of them.
Any idea will be appreciated.
Best,

1) grep/read.table grep out lines with END AS and use read.table with a sep of double quote to read those. The second column will be the desired data. No regular expressions or packages are used.
read.table(text = grep("END AS", s, value = TRUE, fixed = TRUE),
sep = '"', as.is = TRUE)[[2]]
## [1] "MCR-NPR" "NRC-NPR" "OFD-NPR" "TDB-NPR" "TDC-NPR" "UDC-NPR" "BIN-WRN"
## [8] "DSP-WRN"
1a) This is similar to (1) but uses sub with a regular expression instead of read.table:
sub('.*END AS "(.+)".*', "\\1", grep("END AS", s, value = TRUE))
## [1] "MCR-NPR" "NRC-NPR" "OFD-NPR" "TDB-NPR" "TDC-NPR" "UDC-NPR" "BIN-WRN"
## [8] "DSP-WRN"
2) strapply Another approach is the following. It makes use of the fact that the desired strings follow END AS and are surrounded with double quotes It has the shortest code of the ones shown here.
library(gsubfn)
unlist(strapplyc(s, 'END AS "(.+)"'))
## [1] "MCR-NPR" "NRC-NPR" "OFD-NPR" "TDB-NPR" "TDC-NPR" "UDC-NPR" "BIN-WRN"
## [8] "DSP-WRN"
3) strcapture Another base R approach using the same pattern as in (2) is:
na.omit(strcapture('END AS "(.+)"', s, list(value = character(0))))
giving:
value
9 MCR-NPR
13 NRC-NPR
17 OFD-NPR
21 TDB-NPR
25 TDC-NPR
29 UDC-NPR
33 BIN-WRN
37 DSP-WRN
Note
The input s in reproducible form:
s <-
c("SELECT ", " DTE as \"Date\",", " CURRENT_DATE AS \"Day\",",
" concat( BCCO, BCBCH ) AS \"client/batch\",", " BCSTAT as \"Batch Status\",",
" CASE ", " WHEN EXC = 'MCR' THEN CNT ", " ELSE 0 ", " END AS \"MCR-NPR\",",
" CASE ", " WHEN EXC = 'NRC' THEN CNT ", " ELSE 0 ", " END AS \"NRC-NPR\",",
" CASE ", " WHEN EXC = 'OFD' THEN CNT ", " ELSE 0 ", " END AS \"OFD-NPR\",",
" CASE ", " WHEN EXC = 'TDB' THEN CNT ", " ELSE 0 ", " END AS \"TDB-NPR\",",
" CASE ", " WHEN EXC = 'TDC' THEN CNT ", " ELSE 0 ", " END AS \"TDC-NPR\",",
" CASE ", " WHEN EXC = 'UDC' THEN CNT ", " ELSE 0 ", " END AS \"UDC-NPR\",",
" CASE ", " WHEN EXC = 'BIN' THEN CNT ", " ELSE 0 ", " END AS \"BIN-WRN\",",
" CASE ", " WHEN EXC = 'DSP' THEN CNT ", " ELSE 0 ", " END AS \"DSP-WRN\"")

Related

DAX formula to show not repeated values and count them

I have a table with for which the column "CODE" has values like this:
FTRA2
BRB92
RBRB4
XYZ
SXM4
RBRB4
NLDR
XYZ
FTRA2
POEU
FTRA2
I currently have this formula
="[ Unique values " & DISTINCTCOUNT(MyTable[CODE]) & "]
" & CONCATENATEX(DISTINCT(MyTable[CODE]), MyTable[CODE] ,", ")
that outputs this:
[ Unique values 7 ]
FTRA2, BRB92, RBRB4, XYZ, SXM4, NLDR, POEU
I would like to show all the unique values and their count (except those with the string "XYZ") and below show how many "XYZ" values are, like this:
[ Unique values 6 ]
FTRA2, BRB92, RBRB4, SXM4, NLDR, POEU
[2 XYZ values]
In this case there are 2 "XYZ" values, but could be zero XYZ values too.
I'm using Excel 2016.
How can I do this? Thanks in advance.
UPDATE1
I get this error tryng Joe's solution.
UPDATE2
Joe, I was able to make work your first part modifying like this:
= VAR ExcludeValue = "XYZ"
RETURN
CALCULATE(
"[ Unique values " & DISTINCTCOUNT(MyTable[Code]) & " ]"
" & CONCATENATEX(DISTINCT(MyTable[Code]), [Code], ", ")
, MyTable[Code] <> ExcludeValue
)
But when I add the second part it says this error
This formula is invalid or incomplete: 'Calculation error in
measure 'MyTable[Code]: The function COUNT takes an argument
that evaluates to numbers or dates and cannot work with values
of type String.'.
I also removed the UNICHAR since doesn't work on Excel.
UPDATE3
Joe's solution it works correctly after I modified the COUNT(MyTable[Code]) to COUNTROWS(MyTable)
The final solution looks like this.
=VAR ExcludeValue = "XYZ"
RETURN
CALCULATE(
"
[ Unique values " & DISTINCTCOUNT(MyTable[Code]) & " ]
" & CONCATENATEX(DISTINCT(MyTable[Code]), [Code], ", ")
, MyTable[Code] <> ExcludeValue
) & "
" & CALCULATE(
"[" & COUNTROWS(MyTable) & " " & ExcludeValue & " values]"
, MyTable[Code] = ExcludeValue
) & "
"
Update4
Print nothing when there is no "XYZ" values works with your IF() addition. I've tried to follow your logic to do the same when there is no values at all. I added an
IF() to count if MyTable[Code] <> ExcludeValue is greater than 0 and if true do original CALCULATE, if not BLANK() but doesnt work.
CountLabel =
VAR ExcludeValue = "XYZ"
RETURN
IF(
CALCULATE(COUNTROWS(MyTable), MyTable[Code] <> ExcludeValue) > 0,
CALCULATE(
"[ Unique values " & DISTINCTCOUNT(MyTable[Code]) & " ]"
& UNICHAR(10) &
CONCATENATEX(DISTINCT(MyTable[Code]), [Code], ", ")
, MyTable[Code] <> ExcludeValue
),
BLANK()
)
& IF(
CALCULATE(COUNTROWS(MyTable), MyTable[Code] = ExcludeValue) > 0,
UNICHAR(10) & " " & UNICHAR(10) &
CALCULATE(
"[" & COUNTROWS(MyTable) & " " & ExcludeValue & " values]"
, MyTable[Code] = ExcludeValue
),
BLANK()
)
FINAL UPDATE
This is the final formula that works as expected. Thanks to Joe's help in this case.
=VAR ExcludeValue = "XYZ"
RETURN
IF(
CALCULATE(DISTINCTCOUNT(MyTable[Code]), MyTable[Code] <> ExcludeValue) > 0 &&
MyTable[Count of Code]>0,
CALCULATE(
"
[ Unique values " & DISTINCTCOUNT(MyTable[Code]) & " ]
" & CONCATENATEX(DISTINCT(MyTable[Code]), [Code], ", ")
, MyTable[Code] <> ExcludeValue
),
BLANK()
)
&
IF(
CALCULATE(DISTINCTCOUNT(MyTable[Code]), MyTable[Code] <> ExcludeValue) > 0 &&
CALCULATE(COUNTROWS(MyTable), MyTable[Code] = ExcludeValue) > 0,
"
" &
BLANK()
)
& IF(
CALCULATE(COUNTROWS(MyTable), MyTable[Code] = ExcludeValue) > 0,
CALCULATE(
"[" & COUNTROWS(MyTable) & " " & ExcludeValue & " values]"
, MyTable[Code] = ExcludeValue
),
BLANK()
) & "
"

UPDATE: - Changed my formula from using COUNT to COUNTROWS based on feedback from OP.
UPDATE 2: - Add IF statement to formula to exclude excluded count when 0.
UPDATE 3: - Add IF statement to formula to exclude distinct count when 0.
I will say that I created this solution in Power BI, but Excel 2016 should have the same functionality when it comes to DAX (with minor tweaks).
I created a measure with your formula, and simply wrapped each piece (the distinct count, and the repeated count) with a CALCULATE statement that is used to filter your MyTable down to the codes you care about.
I used a variable for the "XYZ" value in case that needs to be changed. Now you can simply change it in one place (at the beginning of the formula) and the rest of the formula will reflect that change.
I also used UNICHAR(10) to add the line breaks instead of counting on the new lines in the formula.
With the IF statements...
The first will check if the distinct count of items not equal to the specified value is greater than zero. If not, it won't show anything.
The second will check if the distinct count and the row count of the specified value are both greater than zero. If they are, it will add the line break.
The third will check if the row count of items equal to the specified value is greater than zero. If not, it won't show anything.
The final formula is:
CountLabel =
VAR ExcludeValue = "XYZ"
RETURN
IF(
CALCULATE(DISTINCTCOUNT(MyTable[Code]), MyTable[Code] <> ExcludeValue) > 0,
CALCULATE(
"[ Unique values " & DISTINCTCOUNT(MyTable[Code]) & " ]"
& UNICHAR(10) &
CONCATENATEX(DISTINCT(MyTable[Code]), [Code], ", ")
, MyTable[Code] <> ExcludeValue
),
BLANK()
)
&
IF(
CALCULATE(DISTINCTCOUNT(MyTable[Code]), MyTable[Code] <> ExcludeValue) > 0 &&
CALCULATE(COUNTROWS(MyTable), MyTable[Code] = ExcludeValue) > 0,
UNICHAR(10) & " " & UNICHAR(10),
BLANK()
)
& IF(
CALCULATE(COUNTROWS(MyTable), MyTable[Code] = ExcludeValue) > 0,
CALCULATE(
"[" & COUNTROWS(MyTable) & " " & ExcludeValue & " values]"
, MyTable[Code] = ExcludeValue
),
BLANK()
)
Here is what the result looks like (again, in Power BI).

Came up with something similar but slightly different using COUNTROWS instead of CALCULATE to filter the table for the unique item. Also I am just learning DAX so don't know if this is a "proper" way to do it, but it seems to work.
Measure =
VAR Exclusion = "XYZ"
RETURN
"[ Unique values " & COUNTROWS(FILTER(DISTINCT(MyTable[CODE]), [CODE] <> Exclusion)) & "]
" & CONCATENATEX(FILTER(DISTINCT(MyTable[CODE]), [CODE] <> Exclusion), [CODE] ,", ") &
"
[" & COUNTROWS(FILTER(MyTable, MyTable[CODE] = Exclusion))+0 & " " & Exclusion & " values]"

Commas in Julia 1.0 Numbers of Any Kind

I would like a function in Julia code, num2str, that would add commas or a specified delimiter, in the appropriate places on the left side of the decimal point for any kind of valid Julia number, including BigInt and BigFloat. It would return a string for printing.
For example:
flt1 = 122234.141567
println("flt1 num2str($flt1) = ", num2str(flt1))
# Expected output is:
flt1 num2str(122234.141567) = 122,234.141567
I want to use this function with the print and println built-in functions.

This question was partially answered here, i.e, for integers. The following function should answer the need for floats and "big" numbers.
"""
For any valid number, add appropriate delimiters.
See "Regular Expressions Cookbook," by Goyvaerts and Levithan, O'Reilly, 2nd Ed,
p. 402, for Regex that inserts commas into integers returning a string.
"""
function num2str(num::Number; delim=",")
decimal_point = "."
str = string(num)
strs = split(str, decimal_point)
left_str = strs[1]
right_str = length(strs) > 1 ? strs[2] : ""
left_str = replace(left_str, r"(?<=[0-9])(?=(?:[0-9]{3})+(?![0-9]))" => delim)
decimal_point = occursin(decimal_point, str) ? decimal_point : ""
return left_str * decimal_point * right_str
end
# Test integers, BigInts, floats, and BigFloats:
int0 = 0
int1 = 123
int2 = 123456789
big1 = big"123"
big2 = big"123456789123456789"
flt1 = 122234.141567
flt2 = 7.12345e9
big3 = big"260123.0"
big4 = big"7.12345e9"
setprecision(20)
println("int0 num2str($int0) \t\t\t\t = ", num2str(int0))
println("int1 num2str($int1) \t\t\t\t = ", num2str(int1))
println("int2 num2str($int2) \t\t\t = ", num2str(int2))
println("big1 num2str($big1) \t\t\t\t = ", num2str(big1))
println("big2 num2str($big2) \t\t = ", num2str(big2))
println("big2 num2str($big2) delim is _ \t = ", num2str(big2, delim="_"))
println("flt1 num2str($flt1) \t\t\t = ", num2str(flt1))
println("flt1 num2str($flt1) delim is _ \t\t = ", num2str(flt1, delim="_"))
println("flt2 num2str($flt2) \t\t\t = ", num2str(flt2))
println("big3 num2str($big3) \t\t\t = ", num2str(big3))
println("big4 num2str($big4) \t\t\t = ", num2str(big4))
println("big4 num2str($big4) delim is _ \t\t = ", num2str(big4, delim="_"))
## ============================== Output ===================================
int0 num2str(0) = 0
int1 num2str(123) = 123
int2 num2str(123456789) = 123,456,789
big1 num2str(123) = 123
big2 num2str(123456789123456789) = 123,456,789,123,456,789
big2 num2str(123456789123456789) delim is _ = 123_456_789_123_456_789
flt1 num2str(122234.141567) = 122,234.141567
flt1 num2str(122234.141567) delim is _ = 122_234.141567
flt2 num2str(7.12345e9) = 7.12345e9
big3 num2str(2.60123e+05) = 2.60123e+05
big4 num2str(7.12345e+09) = 7.12345e+09
big4 num2str(7.12345e+09) delim is _ = 7.12345e+09
I expect the ability to add comma delimiters will eventually be added to either print/println and/or #printf. Until then, this seems to work.

syntax error in SQL Update statement

I'm getting a syntax error in my UPDATE statement, but I'm not sure where exactly it is. Here's my code:
strSelected = "UPDATE CFRRR SET assignedby = " & Me.cmbassignedby.Column(1) & ", assignedto = " & _
Me.cmbassignedto.Column(2) & ", Dateassigned = " & Now() & ", actiondate = " & _
Now() & ", Workername = " & Me.cmbassignedto.Column(2) & ", WorkerID = " & _
Me.cmbassignedto.Column(1) & " WHERE CFRRRID In ( " & strSelected & " );"
CurrentDb.Execute strSelected

It's most likely because of the Now() function, which also prints the current time (seperated with a space) - hence the syntax error. Try to surround them with single quotation marks.
You can also print out the SQL Statement
Debug.Print strSelected to see what you have concatenated...

Make Certificate Visible to user after user meets target

I am having problem wrapping my head around this problem.
If you run the query embedded in code below, it shows Total Questions (TotalQuestions) asked, Total Correct (TotalCorrect2) and percentage correct (PercentCorrect2).
We have students who participate in video tutorials.
After each lesson, they are given quizzes, a total of 30 questions.
To pass through each segment of the tutorials, a student must score at least 80% of the quizzes.
The code below is calculating Total questions, total correct and percentage correct.
Here is where I am having issues.
If a student answers all 30 questions AND gets 80% or more of the questions correct, a certificate is exposed to the student so s/he can print his or her certificate.
If both criteria is not met, the certificate stays hidden from the student.
I am having difficulty with the IF portion of this task.
For instance,
if totalQuestions = 30 and percentCorrect2 >= 80 then
btnGetUser.Visible = True
else
btnGetUser.Visible = False
end if
This above is not working.
When I run the code, and a particular user meets the condition of the IF statement, the certificate is supposed to be exposed. It does so when I run it in SSMS.
So far, the certificate has remained hidden even though I am testing with users who have taken and passed the tests and have made the conditions.
When I debug the code, I can see that totalQuestions shows 30 which is correct but the rest show 0.
Any ideas what I could be doing wrong?
Dim s As String = ";WITH Questions AS " & _
" ( SELECT SQ.QuestionID, " & _
" CorrectChoices = COUNT(NULLIF(SC.IsCorrect, 0)), " & _
" ChoicesGiven = COUNT(SA.ChoiceID), " & _
" CorrectChoicesGiven = COUNT(CASE WHEN SA.ChoiceID IS NOT NULL AND SC.IsCorrect = 1 THEN 1 END), " & _
" ExtraChoicesGiven = CASE WHEN COUNT(SA.ChoiceID) > COUNT(NULLIF(SC.IsCorrect, 0)) THEN COUNT(SA.ChoiceID) - COUNT(NULLIF(SC.IsCorrect, 0)) ELSE 0 END " & _
" FROM SurveyQuestions SQ " & _
" INNER JOIN SurveyChoices SC " & _
" ON SQ.QuestionId = SC.QuestionID " & _
" LEFT JOIN SurveyAnswers SA " & _
" ON SA.QuestionId = SC.QuestionID " & _
" AND SA.ChoiceID = SC.ChoiceID " & _
" AND SA.UserName = #username " & _
" GROUP BY SQ.QuestionID " & _
" ), QuestionScores AS " & _
" (SELECT QuestionID, " & _
" Score = CASE WHEN CorrectChoicesGiven - ExtraChoicesGiven < 0 THEN 0 " & _
" ELSE CAST(CorrectChoicesGiven - ExtraChoicesGiven AS FLOAT) / CorrectChoices " & _
" END, " & _
" Score2 = ISNULL(CAST(CorrectChoicesGiven AS FLOAT) / NULLIF(CASE WHEN ChoicesGiven > CorrectChoices THEN ChoicesGiven ELSE CorrectChoices END, 0), 0) " & _
" FROM Questions " & _
" )
" SELECT TotalQuestions = COUNT(*), " & _
" TotalCorrect = SUM(Score), " & _
" PercentCorrect = CAST(100.0 * SUM(Score) / COUNT(*) AS DECIMAL(5, 2)), " & _
" TotalCorrect2 = SUM(Score2), " & _
" PercentCorrect2 = CAST(100.0 * SUM(Score2) / COUNT(*) AS DECIMAL(5, 2)) " & _
" FROM QuestionScores;"
Dim connStr As String = ConfigurationManager.ConnectionStrings("DBConnectionString").ConnectionString
Dim conn As New SqlConnection(connStr)
Dim cmd As New SqlCommand(s, conn)
Dim TotalQuestions As Integer
Dim TotalCorrect As Double
Dim TotalPercent As Decimal
Dim TotalCorrect2 As Double
Dim TotalPercent2 As Decimal
conn.Open()
cmd.Parameters.AddWithValue("#username", username.Text)
Dim reader As SqlDataReader = cmd.ExecuteReader()
If reader.Read() Then
TotalQuestions = reader.GetInt32(0)
TotalCorrect = reader.GetDouble(1)
TotalPercent = reader.GetDecimal(2)
TotalCorrect2 = reader.GetDouble(3)
TotalPercent2 = reader.GetDecimal(4)
End If
reader.Close()
conn.Close()
If TotalQuestions = 30 and TotalPercent2 >= 80 Then
btnGetUser.Visible = True
Else
btnGetUser.Visible = False
End If

ASP.NET DataContext issue

I'm calling the ExecuteQuery method of my DataContext object. I expect a String and an Integer for each row as a result but all my values are nothing and 0 when I run the ToList function. All my results should be different strings and numbers. My query runs perfectly if I run it directly, but ExecuteQuery returns garbage instead of valid results. What can be the cause of this?
Thank you in advance.
Edit:
public function something as List(of Pair(of String, Integer))
Dim c As TTDataContext = ContextFactory.CreateDataContext()
Dim startValueLen = CStr(StartValue).Length
Dim query As String = "select top " & CStr(Limit) & " case " &
" when WONum like '0000%' then SUBSTRING(WONum, 5, Len(WONum) - 4) " &
" when WONum like '000%' then SUBSTRING(WONum, 4, Len(WONum) - 3) " &
" when WONum like '00%' then SUBSTRING(WONum, 3, Len(WONum) - 2) " &
" when WONum like '0%' then SUBSTRING(WONum, 2, Len(WONum) - 1) " &
" else WONum " &
" end as retVal, " &
" case " &
" when WONum like '0000%' then 1 " &
" when WONum like '000%' then 2 " &
" when WONum like '00%' then 3 " &
" when WONum like '0%' then 4 " &
" else LEN(WONum) " &
" end as retLen " &
" from TblWorkOrder " &
" where CompanyID = " & CStr(CompanyID) & " and LEN(WONum) >= " & CStr(startValueLen) & " and (WONum > '" & CStr(StartValue) & "' or LEN(WONum) > " & CStr(startValueLen) & ") " &
" order by retLen, retVal"
Dim temp = c.ExecuteQuery(Of Pair(Of String, Integer))(query)
Return temp.ToList
End Function

The cause of the problem was that my Pair class had a First and a Second property and I didn't return my results as First and as second. So the solution for the problem is to return the first value as First and the second value as Second instead of retVal and retLen.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

extract every element that meets a pattern in a string in R - r

Related

DAX formula to show not repeated values and count them

Commas in Julia 1.0 Numbers of Any Kind

syntax error in SQL Update statement

Make Certificate Visible to user after user meets target

ASP.NET DataContext issue

Categories

Resources