Remove conditional sequence from string in R - r

I have a sequence encoded in a string, but one type of step in this sequence is entirely conditional on a previous step.
When this occurs, I'd like to remove the previous step.
For example, in the case:
"alpha_i, bravo_i, alpha_i, alpha_c, charlie_i, bravo_i, bravo_c,
alpha_i, delta_c"
those steps where a *_c event occurs directly after an *_i event, I'd like to have the *_i event removed, the desired result being:
"alpha_i, bravo_i, alpha_c, charlie_i, bravo_c, alphai_i,
delta_c"
In other words,
"alpha_i, alpha_c" goes to just "alpha_c"
"bravo_i, bravo_c" goes to just "bravo_c",
but we do not change "alpha_i, delta_c" because they are a different event name.
I think the syntax would use the gsub function, but I don't know how to match the prefixed term either side of the comma, and would appreciate some help.
*In addition to the point raised below; yes there will be many different examples of event names, not just the two being replaced here.

Try this:
wds <- c("alpha_i", "bravo_i", "alpha_i", "alpha_c", "charlie_i", "bravo_i", "bravo_c", "alpha_i", "delta_c")
wds[cumsum(rle(as.character(substr(wds, 1, gregexpr('_', wds))))$lengths)]
Alternatively, if your vector is of length 1, try this:
wds <- c("alpha_i, bravo_i, alpha_i, alpha_c, charlie_i, bravo_i, bravo_c, alpha_i, delta_c")
wds_split <- unlist(strsplit(wds, ', '))
wds_split[cumsum(rle(as.character(substr(wds_split, 1, gregexpr('_', wds_split))))$lengths)]

Related

Update dictionary key inside list using map function -Python

I have a dictionary of phone numbers where number is Key and country is value. I want to update the key and add country code based on value country. I tried to use the map function for this:
print('**Exmaple: Update phone book to add Country code using map function** ')
user=[{'952-201-3787':'US'},{'952-201-5984':'US'},{'9871299':'BD'},{'01632 960513':'UK'}]
#A function that takes a dictionary as arg, not list. List is the outer part
def add_Country_Code(aDict):
for k,v in aDict.items():
if(v == 'US'):
aDict[( '1+'+k)]=aDict.pop(k)
if(v == 'UK'):
aDict[( '044+'+k)]=aDict.pop(k)
if (v == 'BD'):
aDict[('001+'+k)] =aDict.pop(k)
return aDict
new_user=list(map(add_Country_Code,user))
print(new_user)
This works partially when I run, output below :
[{'1+952-201-3787': 'US'}, {'1+1+1+952-201-5984': 'US'}, {'001+9871299': 'BD'}, {'044+01632 960513': 'UK'}]
Notice the 2nd US number has 2 additional 1s'. What is causing that?How to fix? Thanks a lot.
Issue
You are mutating a dict while iterating it. Don't do this. The Pythonic convention would be:
Make a new_dict = {}
While iterating the input a_dict, assign new items to new_dict.
Return the new_dict
IOW, create new things, rather than change old things - likely the source of your woes.
Some notes
Use lowercase with underscores when defining variable names (see PEP 8).
Lookup values rather than change the input dict, e.g. a_dict[k] vs. a_dict.pop(k)
Indent the correct number of spaces (see PEP 8)

LPAD is not working in progress 4gl

I am tring to use lpad in progress Db but its not working..
Code:
lpad(act_num, 7, '#')
This code not working , Do we have any alternative way to achieve o/p.
If act_num is 101 then o/P shoud br 7777101.
There is no lpad() function in OpenEdge, but you may be able to use the FILL() function. It takes two inputs: a character string to use as the fill value, and the number of times to repeat the string.
This will add four "7"s to the beginning of act_num, as you described in your question:
DEFINE VARIABLE act_num AS CHARACTER NO-UNDO INITIAL "101".
act_num = FILL("7", 4) + act_num.
MESSAGE act_num VIEW-AS ALERT-BOX.
The fill value can be any string, and not just a single character.

Select any character string over an NA in an If statement in R

I am trying to create a function which will look at two vectors of character labels, and print the appropriate label based on an If statement. I am running into an issue when one of the vectors is populated by NA.
I'll truncate my function:
eventTypepriority=function(a,b) {
if(is.na(a)) {print(b)}
if(is.na(b)) {print(a)}
if(a=="BW"& b=="BW",) {print("BW")}
if(a=="?BW"& b=="BW") {print("?BW")}
...#and so on
}
Some data:
a=c("Pm", "BW", "?BW")
b=c("PmDP","?BW",NA)
c=mapply(eventTypepriority, a,b, USE.NAMES = TRUE)
The function works fine for the first two, selecting the label I've designated in my if statements. However, when it gets to the third pair I receive this error:
Error in if (a == "?BW" & b == "BW") { :
missing value where TRUE/FALSE needed
I'm guessing this is because at that place, b=NA, and this is the first if statement, outside of the 'is.na' statements, that need it to ignore missing values.
Is there a way to handle this? I'd really rather not add conditional statements for every label and NA. I've also tried:
-is.null (same error message)
-Regular Expressions:
if(a==grepl([:print:]) & b==NA) {print(a)}
In various formats, including if(a==grepl(:print:)... No avail. I receive an 'Error: unexpected '[' or whatever character R didn't like first to tell me this is wrong.
All comments and thoughts would be appreciated. ^_^
if all your if conditions are exclusives, just call return() to avoid checking other conditions when one is met:
eventTypepriority=function(a,b) {
if(is.na(a)) {print(b);return()}
if(is.na(b)) {print(a);return()}
if(a=="BW"& b=="BW",) {print("BW");return()}
if(a=="?BW"& b=="BW") {print("?BW");return()}
...#and so on
}
You need to use if .. else statements instead of simply if; otherwise, your function will evaluate the 3rd and 4th lines even when one of the values is n/a.
Given you mapply statement, I also assume you want the function to output the corresponding label, not just print it?
In that case
eventTypepriority<-function(a,b) {
if(is.na(a)) b
else if(is.na(b)) a
else if(a=="BW"& b=="BW") "BW"
else if(a=="?BW"& b=="BW") "?BW"
else "..."
}
a=c("Pm", "BW", "?BW")
b=c("PmDP","?BW",NA)
c=mapply(eventTypepriority, a,b, USE.NAMES = T)
c
returns
Pm BW ?BW
"..." "..." "?BW"
If you actually want to just print the label and have your function return something else, you should be able to figure it out from here.

Unexpected string constant in R when try to select colname from data.table

I try to group by my customize movieLense dataset
groupBy<- data.table(unifiedTbl)
x<- groupBy[,list(rating=sum(rating)
,Unknown=sum(unknown)
,Action=sum(Action)
,Adventure = sum(Adventure)
,Animation = sum(Animation)
,"Children's" = sum(Children's)
),by=list(user_id,age,occupation)]
but because of Children's I received some error which related to specified character
If I remove below part of my code every things is OK
,"Children's" = sum(Children's)
Now my question is how can I address to this column with full name?
how can I fix my codes?
You can use backticks with names that aren't valid syntax:
`Children's` = sum(`Children's`)
And of course, I'd recommend creating valid names instead:
setnames(groupBy, make.names(names(groupBy)))

R - create iterable list/dataframe from unique()

I'd like to get the unique elements from a column. That seems straight forward. Both of these work, but I'm not getting the object type I'd like:
userlist <- as.list(somebigdf$username)
userlist <- unique(userlist)
or
userlist <- unique(somebigdf$username)
When I iterate through, I'm not getting the names:
for(i in 1:length(userlist)){
cat(names(userlist[i]), '\n')
}
Returns blank spaces.
for(i in userlist){
cat(i, '\n')
}
Returns integers.
The above function is just an example. I'll be using that but also matching the returned name in an if-else function.
The object types seem to be integers or an extended data.frame with lots of values for each name - which isn't what I want. I would really just like a list of strings something along the lines of userlist = c( the results from unique).
Edit -
This code will iterate correctly through the names:
for(name in unique(somebigdf$username)){
cat(name, '\n')
}
I'm accepting my own answer. Namely, a working solution - this code will iterate correctly through the names:
for(name in unique(somebigdf$username)){
cat(name, '\n')
}
If someone at a later date has a better answer that seems more in keeping with the question, I will be happy to accept that as the answer.

Resources