SQLite how to replace characters while excluding some characters - sqlite

I have an SQLite file containing words and their phonetic transcription as follows:
Source (sɔɹs)
Song (sɔŋ)
Daughter (ˈdɔtəɹ)
...
I want to change every 'ɔ' character with 'a' character if 'ɔ' isn't followed by "ŋ" or "ɹ"
My code so far:
UPDATE words
SET phonetictranscription_ame = replace(phonetictranscription_ame, "ɔ", "a")
WHERE phonetictranscription_ame NOT IN ("ɔŋ", "ɔɹ")
This code replaces the 'ɔ' character with 'a' character but it also replaces the 'ɔ's with words including "ɔŋ" and "ɔɹ". Is something wrong with my code?

First, replace all occurrences of 'ɔŋ' and 'ɔɹ' with symbols that you are sure do not exist in the column (I chose '|1' and '|2' but I'm not a linguist, so you can change them).
Then replace all 'ɔ's to 'a's.
Finally restore back the 'ɔŋ's and 'ɔɹ's:
UPDATE words
SET phonetictranscription_ame =
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
phonetictranscription_ame, 'ɔŋ', '|1'
), 'ɔɹ', '|2'
), 'ɔ', 'a'
), '|1', 'ɔŋ'
), '|2', 'ɔɹ'
)
WHERE phonetictranscription_ame LIKE '%ɔ%';
See the demo.

Related

Airflow SqlToS3Operator has unwanted an index in the beginning

Recent airflow-providers-amazon has deprecated MySQLToS3Operator and introduced SqlToS3Operator and now it is adding an index column in the beginning of the CSV dump.
For example, if I run the following
sql_to_s3_task = SqlToS3Operator(
task_id="sql_to_s3_task",
sql_conn_id=conn_id_name,
query="SELECT created_at, score FROM my_table",
s3_bucket=bucket_name,
s3_key=key,
replace=True,
)
The S3 file has something like this:
,created_at,score
1,2023-01-01,5
2,2023-01-02,6
The output seems to be a direct dump from Pandas. How can I remove this unwanted preceding index column?
The operator uses pandas DataFrame under the hood.
You should use pd_kwargs. It allows you to pass arguments to include in DataFrame .to_parquet(), .to_json() or .to_csv().
Since your output is csv the relevant pandas.DataFrame.to_csv parameters are:
header: bool or list of str, default True
Write out the column names. If a list of strings is given it is assumed to be aliases for the column names.
index: bool, default True
Write row names (index).
Thus you can do:
sql_to_s3_task = SqlToS3Operator(
task_id="sql_to_s3_task",
sql_conn_id=conn_id_name,
query="SELECT created_at, score FROM my_table",
s3_bucket=bucket_name,
s3_key=key,
replace=True,
file_format="csv",
pd_kwargs={"index": False, "header": False},
)

Add Header to an XML to CSV conversion using XQuery

Hi I am trying to convert some xml to csv using xquery and found a previous post that helped me get to this point:
for $b in /root/Result
return
concat(escape-html-uri(string-join(($b/HolidayEndDate,
$b/HolidayType,
$b/FirstName,
$b/AllowanceRemainingDays,
$b/HolidayStartDate,
$b/EmployeeId,
$b/AllowanceDays,
$b/LastName,
$b/HolidayDurationDays
)
/normalize-space(),
",")
),
codepoints-to-string(10))
This returns all of the data as required but no Header row. Is there a simple addition to the above code that would also return the header row? Thanks. :)
Since your query returns a sequence of lines, you can just prepend another line before the FLWOR expression:
"HolidayEndDate,HolidayType,FirstName,AllowanceRemainingDays,HolidayStartDate,EmployeeId,AllowanceDays,LastName,HolidayDurationDays
",
for $b in /root/Result
return
concat(escape-html-uri(string-join(($b/HolidayEndDate,
$b/HolidayType,
$b/FirstName,
$b/AllowanceRemainingDays,
$b/HolidayStartDate,
$b/EmployeeId,
$b/AllowanceDays,
$b/LastName,
$b/HolidayDurationDays
)
/normalize-space(),
",")
),
codepoints-to-string(10))
Because nested sequences are flattened (i.e. concatenated) in XQuery, this results in one output sequence including the header. Note also that I used a character entity '
' for the newline character, which is much shorter than codepoints-to-string(10).
concat("HolidayEndDate,HolidayType,FirstName,AllowanceRemainingDays,HolidayStartDate,EmployeeId,AllowanceDays,LastName,HolidayDurationDays
",
string-join(
for $b in /root/Result
return
concat(escape-html-uri(string-join(($b/HolidayEndDate,
$b/HolidayType,
$b/FirstName,
$b/AllowanceRemainingDays,
$b/HolidayStartDate,
$b/EmployeeId,
$b/AllowanceDays,
$b/LastName,
$b/HolidayDurationDays
)
/normalize-space(),
",")
),
codepoints-to-string(10)), "")
)

How to make gsub() work on entire column?

I am trying to make gsub replace hex characters I have into Hebrew abc,
Using the following function:
name<-gsub("\u0080","א",name)
name<-gsub("\u0081","ב",name)
name<-gsub("\u0082","ג",name)
name<-gsub("\u0083","ד",name)
name<-gsub("\u0084","ה",name)
name<-gsub("\u0085","ו",name)
name<-gsub("\u0086","ז",name)
name<-gsub("\u0087","ח",name)
name<-gsub("\u0088","ח",name)
name<-gsub("\u0089","י",name)
name<-gsub("\u008a","ך",name)
name<-gsub("\u008b","כ",name)
name<-gsub("\u008c","ל",name)
name<-gsub("\u008d","ם",name)
name<-gsub("\u008e","מ",name)
name<-gsub("\u008f","ן",name)
name<-gsub("\u0090","נ",name)
name<-gsub("\u0091","ס",name)
name<-gsub("\u0092","ע",name)
name<-gsub("\u0093","ף",name)
name<-gsub("\u0094","פ",name)
name<-gsub("\u0095","ץ",name)
name<-gsub("\u0096","צ",name)
name<-gsub("\u0097","ק",name)
name<-gsub("\u0098","ר",name)
name<-gsub("\u0099","ש",name)
name<-gsub("\u009a","ת",name)
I have a variable called 'name' which contains the hex characters (for example):
[1] "-"
[2] "\u0083 \u0087\u0082\u0080 \u008f\u008c\u0098\u0080 \u0081\u0089\u0081\u0080"
[3] "-"
[4] "\u0084 \u0087\u0082\u0080 \u008f\u008c\u0098\u0080 \u0081\u0089\u0081\u0080"
When inserting the values into vector, manually, like this:
name<- c("-" ,
"\u0083 \u0087\u0082\u0080 \u008f\u008c\u0098\u0080 \u0081\u0089\u0081\u0080",
"-" ,
"\u0084 \u0087\u0082\u0080 \u008f\u008c\u0098\u0080 \u0081\u0089\u0081\u0080")
and running my script it works, but, when I try to make it run through the whole database, by using the following script to insert the values into 'name' variable:
cond<-list_kind %in% c("02")
name<-ifelse(cond,substr(data_set$data_from_row,25,39),"-")
(Because I need only the names in list kind 2)
it just prints the name as it was, as hex.

Finding a word with condition in a vector with regex on R (perl)

I would like to find the rows in a vector with the word 'RT' in it or 'R' but not if the word 'RT' is preceded by 'no'.
The word RT may be preceded by nothing, a space, a dot, etc.
With the regex, I tried :
grep("(?<=[no] )RT", aaa,ignore.case = FALSE, perl = T)
Which was giving me all the rows with "no RT".
and
grep("(?=[^no].*)RT",aaa , perl = T)
which was giving me all the rows containing 'RT' with and without 'no' at the beginning.
What is my mistake? I thought the ^ was giving everything but the character that follows it.
Example :
aaa = c("RT alone", "no RT", "CT/RT", "adj.RTx", "RT/CT", "lang, RT+","npo RT" )
(?<=[no] )RT matches any RT that is immediately preceded with "n " or "o ".
You should use a negative lookbehind,
"(?<!no )RT"
See the regex demo.
Or, if you need to check for a whole word no,
"(?<!\\bno )RT"
See this regex demo.
Here, (?<!no ) makes sure there is no no immediately to the left of the current location, and only then RT is consumed.

String recognition in idl

I have the following strings:
F:\Sheyenne\ROI\SWIR32_subset\SWIR32_2005210_East_A.dat
F:\Sheyenne\ROI\SWIR32_subset\SWIR32_2005210_Froemke-Hoy.dat
and from each I want to extract the three variables, 1. SWIR32 2. the date and 3. the text following the date. I want to automate this process for about 200 files, so individually selecting the locations won't exactly work for me.
so I want:
variable1=SWIR32
variable2=2005210
variable3=East_A
variable4=SWIR32
variable5=2005210
variable6=Froemke-Hoy
I am going to be using these to add titles to graphs later on, but since the position of the text in each string varies I am unsure how to do this using strmid
I think you want to use a combination of STRPOS and STRSPLIT. Something like the following:
s = ['F:\Sheyenne\ROI\SWIR32_subset\SWIR32_2005210_East_A.dat', $
'F:\Sheyenne\ROI\SWIR32_subset\SWIR32_2005210_Froemke-Hoy.dat']
name = STRARR(s.length)
date = name
txt = name
foreach sub, s, i do begin
sub = STRMID(sub, 1+STRPOS(sub, '\', /REVERSE_SEARCH))
parts = STRSPLIT(sub, '_', /EXTRACT)
name[i] = parts[0]
date[i] = parts[1]
txt[i] = STRJOIN(parts[2:*], '_')
endforeach
You could also do this with a regular expression (using just STRSPLIT) but regular expressions tend to be complicated and error prone.
Hope this helps!

Resources