I have the following string:
x<-"\"stream;\"\" Well done\"\t\" fans !!\"\";\"\"Boy\""
and I woould like to change it to
x= "\"stream;\"\" Well done fans !!\"\";\"\"Boy\""
would be great if anyone could help me removing \"\t\" from this string.
(from comment) You can use
sub("\"\t\"","",x)
That removes exactly what you're asking to be removed (though there is still an extra space compared to your desired output)
Related
I'm doing a swirl lesson.
This is the problem:
Edit the string inside writeLines() so that it correctly displays
(with the line breaks in these positions)
This is a really
really really
long string
I tried typing
writeLines("This is a really\n\nreally really\n\nlong string")
but the swirl lesson keeps telling me that it is incorrect. Is there a different way to write the same thing?
Swirl is generally very strict about the answer, as it would be time consuming and difficult to put in ways to check for all the potentially correct answers.
As a matter of fact the answer is writelines("This is a really \nreally really \nlong string") (see here). You have the newline \n doubled, so Swirl won't accept that as an answer.
I am scraping a very long forum thread, and I want to come up with a database that has columns containing the following info: date / full post text / quoted user / quoted text / clean text
The clean text should be each user's post, without the quotations if they are replying to anyone. if the post is not a reply, I would leave it as NA. The following is an invented post, with invented user, to illustrate what I have managed to do so far:
post<-"Meow1 wrote: »\noday is gonna be the day that they're gonna throw it back to you?\nBy now you should've somehow Realized what you gotta do\n\n\nI don't believe that anybody Feels the way I do, about you now\nMeow1 wrote: »\nI'm sure you've heard it all before But you never really had a doubt\n\n\nBecause maybe, you're gonna be the one that saves me\nMeow1 wrote: »\nAnd after all, you're my wonderwall\n\n\nAnd all the lights that lead us there are blinding"
Then I try to pull out the quoted user (Meow1) and it works:
QuotedUser_1<-ifelse(grepl('wrote:', post), gsub('\\s*wrote.*$', '', post), NA)
QuotedUser_1
[1] "Meow1"
Then I created this codes for pulling out the quoted text, and the clean text:
Quotedtext_1<- ifelse(grepl('wrote:', post), gsub('^.*wrote\\s*|\\s*\\n\\n\\n.*$', '', post), NA)
It works when there is only one quoted text, but otherwise, it only gives the last quoted bit (in the example, 'And after all, you´re my wonderwall')
And same for the clean text, it only returns the last reply:
Clean_text<- sub('^.*\\n\\n\\n\\s*|\\s*wrote.*', '', post)
If anyone has a suggestion to improve the code, so that I can have a vector with all the quotations, and a vector with all the replies, I would be very grateful...
Cheers
Are you sure you cannot scrape the author and text information separately? Without a source it's difficult to know, but I guess they can be obtained by different css-selectors making it much easier to split the data.
If not, it might be helpful to look into str_locate_all which allows you to locate all occurences of e.g. "wrote:" and split the string accordingly.
Am using the RODBC library to bring data into R. I have a long query that I want to pass a variable to, much like this SO user.
Problem is that R interprets the whitespace/carriage returns in my query as a newline '\n'.
The accepted solution for this question suggests to simply break up the text into chunks and then paste() together - which works, but ideally I'd like to keep the whitespace intact - makes it easier to test/verify the behavior of the query over in the database before pasting into R.
In other languages I'm familiar with there's a simple line continuation character - indeed, several of the comments on the accepted answer are looking for an approach similar to python's \.
I found an aside to a workaround using strwrap deep in the bowels of an R discussion lists, so in the interest of making the internet better I will post it here. However, if someone can point the direction toward a more elegant/straightforward solution, I will happily accept your answer.
I don't know if you will find this helpful or not, but I have eventually gravitated towards keeping my SQL separate from my R scripts. Keeping the query in my R script, except for very very short ones, I find gets unreadable very quickly.
These days, I tend to keep queries that are more than a single line in their own separate .sql file. Then I can keep them nice and formatted and readable in a nice text editor, and read them into R as needed via something like this:
read_sql <- function(path){
stopifnot(file.exists(path))
sql <- readChar(path,nchar = file.info(path)$size)
sql
}
For binding parameters into the queries, I just keep a %s where the parameter will go in the .sql file, and then add in the parameters in R using sprintf.
I've been much happier this way, as I was finding that cluttering up my R scripts with really long paste statements and multi-line character objects was making my code really hard to read.
R's strwrap will destroy whitespace, including newline characters, per the documentation.
Essentially, you can get the desired behavior by initially letting R introduce line breaks/newline \ns, and then immediately stripping them out.
#make query using PASTE
query_1 <- paste("SELECT map.ps_studentid
,students.first_name || ' ' || students.last_name AS full_name
,map.testritscore
,map.termname
,map.measurementscale
FROM map$comprehensive_with_growth map
JOIN students
ON map.ps_studentid = students.id
WHERE map.termname = '",map_term,"'", sep='')
#remove newline characters introduced above.
#width is an arbitrary big number-
#it just needs to be longer than your string.
query_1 <- strwrap(query_1, width=10000, simplify=TRUE)
#execute the query
map_njask <- sqlQuery(XE, query_1)
query <- gsub(pattern='\\s',replacement="",x=query)
Try using sprintf to get variable substitution, and then replacing all newlines and whitespace.
See my answer to a similar question for details.
I am trying to use regex generators to create an expression, but I can't seem to get it right.
What I need to do is find the following type of string in a string:
community_n
For example, within the string which may be
community community_1 community_new_1 community_1_new
from that, I just want to extract community_1
I have tried /(community_\\d+)/, but that is clearly not right.
Try adding word boundries, so
/(\\bcommunity_\\d+\\b)/
Try using the regex (community_\d+).
Though I could be incorrect since I don't know which language you are using.
(For some reason I cannot add comments, I can only answer questions).
How can one insert a Unicode string CSS into CleverCSS?
In particular, how could one produce the following CSS using CleverCSS:
li:after {
content: "\00BB \0020";
}
I've figured out CleverCSS's parsing rules, but suffice that the permutations I've thought sensible have failed, for example:
li:
content: "\\00BB \\0020" // becomes content: 'BB 0'
EDIT: My other examples and the rest of my post weren't saved. Suffice to say that I had a longer list of examples that's missing.
I'd be grateful for any thoughts and input.
Brian
EDIT: I noted that inserting the unicode was one of the problems (once you start uploading CSS with utf-8 encoding it's fine). The wrapping of quote characters is another, which I solved that with something crazy likeso:
content: "'".string() + " ".string() ».string() + "'".string()
Hope that helps someone else.
This may be silly, but why still bother with escape sequences when you can just type/paste the actual characters? "A CSS style sheet is a sequence of characters from the Universal Character Set".
That is a lot easier on the eye, and is especially useful when maintaining existing code.
Or is CleverCSS not Unicode-enabled?
In looking at the code (CleverCSS 0.1) it would appear that the partial regular expression _r_string (defined on line 414) is where you would need to start. This is used to define several other REs, including _string_re which is used in the parsing rules (line 1374). This leads us to process_string() (line 1359) which looks like it was meant to accept Unicode.
Unfortunately, hand-built parsers tend to get a bit strange and the code is not exactly swimming in comments. If you really need to do this, I would focus on process_string() and put a bunch of before/after print statements in there and see if you can understand the goes-intos and goes-outofs.
You might also try bribing the original author with beer or ??? Good luck.