When does pressing the Enter (Return) key matter when creating a regex matching expression? - r

I want to search for multiple codes appearing in a cell. There are so many codes that I'd like to write parts of the code in succeeding lines. For example, let's say I am looking for "^a11","^b12", "^c67$" or "^d13[[:blank:]]". I am using:
^a11|^b12|^c67$|^d13[[:blank:]]
This seems to work. Now, I tried:
^a11|^b12|
^c67$|^d13[[:blank:]]
That also seemed to work. However when I tried:
^a11|^b12|^c67$|
^d13[[:blank:]]
It did not count the last one.
Note that my code is wrapped into a function. So the above is an argument that I feed the function. I'm thinking that's the problem, but I still don't know why one truncation works while the other does not.

I realized the answer today. The problem is that since I am feeding the regex argument, it will count the next line in the succeeding code.
Thus, the code below was only "working" because ^c67$ is empty.
^a11|^b12|
^c67$|^d13[[:blank:]]
And the code below was not working because ^d13 is not empty but also this setup looks for (next line)^d13[[:blank:]] instead of just ^d13[[:blank:]]
^a11|^b12|^c67$|
^d13[[:blank:]]
So an inelegant fix is:
^a11|^b12|^c67$|
^nothinghere|^d13[[:blank:]]
This inserts a burner code that is empty which is affected by the line break.

Related

How to get console cursor back to the begining of line in R?

Sometimes I perform long lasting calculations in the loop. To get control of the progress I print current value of the loop variable. However it makes my console flooded with numbers.
I would like to move back cursor to the begining of console line after each printing. This would end with the last value of the variable only. That would be similar to command without on line printer.
Any idea on how to achieve that in R console?
as b2f correctly mentions in the comment use \r as in the example below
cat('hello world\rMinor')
Minor world
One thing to note though is that every command is assigned a new line in an interactive setting, so in interactive code this only works as part of a single command.

How to edit hidden character in String

The appearance of "textparcali" in RStudio Source Editor was as follows.
In textparcali (tbl_df), I ran the following code to delete single strings.
textparcali$word<-gsub("\\W*\\b\\w\\b\\W*",'', textparcali$word)
But the deletion was interesting. You can see the picture below. Please note lines 67 and 50.
Everything was fine for line 50 and lines like that. However, this was not the case for line 67 (and I think there are others like it).
I focused on one line(67) to understand why you deleted it wrong. I've already seen what it says on this line in the editor. But I also wanted to look at the console. I wrote the following code to the console.
textparcali$word[67]
The word on line 67 looks different in the console. The value that doesn't appear when you make a copy paste but surprisingly appears on the console:
The reason I put it as a picture is because this character disappears after the copy-paste command.
You can download the file containing this character from the link below. However, you should open it with Notepad ++.
Character.txt
Gsub did his job right. How is that possible? What's the name of this character? When I try to write code that destroys this character, the " sign changes and does not delete.
textparcali$word<-gsub('[[:punct:]]+',' ',textparcali$word) command also does not work.
What is the explanation of my experience? I do not know. Is there a way to destroy this character? What caused this? I ve asked a lot.
Thank you all.
(I apologize for the bad scribbles in the pictures.)
I found the surprise character.
Above Right, Combining Dot ͘ ͘
The following is the code required to eliminate this character.
c<-"surprise character"
c
[1] "\u0358"
textparcali$word<-gsub("\u0358","",textparcali$word,ignore.case = FALSE)
textparcali$word<-gsub("\u307","",textparcali$word,ignore.case = FALSE)
Code 307 did the job for me. However, you should determine what the actual code is. If not, your character code may be incorrect.
More detailed information can be found in the links below.
https://gist.github.com/ngs/2782436
https://www.charbase.com/0358-unicode-combining-dot-above-right
Thanks a lot!

Parsing strings with grep/str_extract

As part of my feature engineering, I need to parse text strings from different languages and keep text enclosed within parentheses. Everything was going well until I encountered a very strange phenomenon. For some languages, the parentheses I need to find look slightly different, and various regexp options fail.
I'm pasting screen-shots because strangely, copying and pasting the strange parentheses changes it to a 'normal' one, so I can't set up a different regex to find those separately.
Notice that the parentheses in the first entry look normal, but for the second entry, it appears sort of 'sharp'
If I use stringr's str_extract, the first instance works fine, but the second fails.
But, the encodings are the same. Anyone know what's going on?
[Edit: here are the results of dput on these same examples. dput apparently sees the parentheses as equivalent, even though grep does not]
c("Obnaružena poterâ šaga na (Motor šprica pipettora R1).", "(STAT tàn zhen Z zhóu ma dá) tàn cè dào diu bù<U+3002>")
Finally, I am actually copy and pasting the two parentheses from R into the code window below; they do appear different this way. First is normal, second is the strange one.
( (

How to replace a string pattern with different strings quickly?

For example, I have many HTML tabs to style, they use different classes, and will have different backgrounds. Background images files have names corresponding to class names.
The way I found to do it is yank:
.tab.home {
background: ...home.jpg...
}
then paste, then :s/home/about.
This is to be repeated for a few times. I found that & can be used to repeat last substitute, but only for the same target string. What is the quickest way to repeat a substitute with different target string?
Alternatively, probably there are more efficient ways to do such a thing?
I had a quick play with some vim macro magic and came up with the following idea... I apologise for the length. I thought it best to explain the steps..
First, place the text block you want to repeat into a register (I picked register z), so with the cursor at the beginning of the .tab line I pressed "z3Y (select reg z and yank 3 lines).
Then I entered the series of VIM commands I wanted into the buffer as )"zp:.,%s/home/. (Just press i and type the commands)
This translate to;
) go the end of the current '{}' block,
"zp paste a copy of the text in register z,
.,%s/home/ which has two tricks.
The .,% ensures the substitution applies to everything from the start of the .tab to the end of the closing }, and,
The command is incomplete (ie, does not have a at the end), so vim will prompt me to complete the command.
Note that while %s/// will perform a substitution across every line of the file, it is important to realise that % is an alias for range 1,$. Using 1,% as a range, causes the % to be used as the 'jump to matching parenthesis' operator, resulting in a range from the current line to the end of the % match. (which in this example, is the closing brace in the block)
Then, after placing the cursor on the ) at the beginning of the line, I typed "qy$ which means yank all characters to the end of the line into register q.
This is important, because simply yanking the line with Y will include a carriage return in the register, and will cause the macro to fail.
I then executed the content of register q with #q and I was prompted to complete the s/home/ on the command line.
After typing the replacement text and pressing enter, the pasted block (from register z) appeared in the buffer with the substitutions already applied.
At this point you can repeat the last #qby simple typing ##. You don't even need to move the cursor down to the end of the block because the ) at the start of the macro does that for you.
This effectively reduces the process of yanking the original text, inserting it, and executing two manual replace commands into a simple ##.
You can safely delete the macro string from your edit buffer when done.
This is incredibly vim-ish, and might waste a bit of time getting it right, but it could save you even more when you do.
Vim macro's might be the trick you are looking for.
From the manual, I found :s//new-replacement. Seemed to be too much typing.
Looking for a better answer.

Improperly formatted CSV, how to repair?

I have a csv, and each line reads as follows:
"http://www.videourl.com/video,video title,video duration,thumbnail,<iframe src=""http://embed.videourl.com/video"" frameborder=0 width=510 height=400 scrolling=no> </iframe>,tag 1,tag 2",,,,,,,,,,,,,,,,,,,,,,,,,,
Is there a program I can use to clean this up? I'm trying to import it to wordpress and map it to current fields, but it isn't functioning properly. Any suggestions?
Just use search and replace in this case. remove the commas at the end and then replace the remaining commas with ",".
Should anyone else have the same issue. Know that this solution will only work with data much like the example giving. If data has a lot of text and there are commas within the text that need kept. Then search replacing comma will not work. Using regex would be the next option and that can be done in Notepad ++
However I think the regex pattern depends on the data so not much point creating an example.
PHP could be used to explode each line also. Remove values that match a regex out of many i.e. URL, money. Then what is left could be (depending on the data again) just a block of text. That approach may not work if there are two or more columns with a lot of text

Resources