Remove re-occuring text strings [closed]

Remove re-occuring text strings [closed] - r

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 years ago.
Improve this question
I am new to R and have searched the forum for almost 2 hours now without getting it to work for me.
My problem: I have a long text string scraped from internet. As I scraped code for images were included. The are coded in a way that they start with "Embed from Getty Images" and ends with "false })});\n". I would like to remove everything in between those strings. I have tried gsub() as per:
AmericanTexts3 <- gsub("Embed.*})});\n", "", AmericanTexts)
But what happens then is that they remove everything between the first picture and the last picture. Do anyone know how to solve this?

You need to use a non-greedy regular expression.
Try
AmericanTexts3<-gsub("Embed.*?})});\n","",AmericanTexts)
The ? matches the first occurence of the second part of the regex, so that only the part between the matches should be removed.

Related

Find strings that start and end with certain characters [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
I'm working on a text-mining project with data from twitter. In my data frame, many words are converted into Unicode characters, e.g.
<U+0E2B><U+0E25><U+0E07><U+0E1E>
I want to collect every converted words like above and put them into 1 large string so I can deal with them separately.
Is there any way I can find all the strings that start with <U+ and end with > using R?

Your request is a bit imprecise, so I'm taking the liberty to make a few assumptions on how you want the output.
text <- "Words <Q+0E2B><U+0E2B2>, 1 < 2, <p>
<U+0E2B><U+0E25><U+0E07><U+0E1E> </p> some more words"
regmatches(text, gregexpr("<U\\+[0-9A-Z]{4}>", text))
# "<U+0E2B>" "<U+0E25>" "<U+0E07>" "<U+0E1E>"

Can I group * things from this data? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
I got data with ASCII form.
I ran it with R, and these data have * marked when it is under other condition.
enter image description here
V1, V2, V3, V4, V5 don't mean anything different. All that matters is to classify between *-ed things.
I tried c(V1,V2,V3,V4,V5) but it returns only the levels.
I have no idea. Help me with it.
Question. Can I specify *-ed things via some code?
Is there a way to make these columned things in one data?

Select the values marked with *. I guess these values come with the symbol from the original file, right?
In this case use:
position <- grep('\\*', as.matrix(distress[]))
selectedValues <- as.matrix(distress[])[position]
numericValues <- as.numeric(gsub('\*', '', selectedValues))

How to move specific data from one column to another using r [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
Im having a problem trying to move data from my address column to my postal code column.
For example:
On the second line im trying to take "Dublin 22" from the data.Address column and moving it to the data.Postal.Code column.
Im using R but i have no idea how to implement it.
Any suggestions?

Try this:
data.Postal.Code <- gsub("^.*, (.*)$", "\\1", data.Address)
Update:
If you want to move Dublin 22 to the postal code column whenever it appears in the address then you can try the following:
data.Postal.Code[grepl("^.* Dublin 22$", data.Address)] <- "Dublin 22"
Here is a demo of the regex used:
Regex101

Excel messes up some dots (".") in a number [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Closed 6 years ago.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Improve this question
I have a tab delimited file:
I created this file in R as a data.frame and wrote it to the above file using write.table(dataFrame,"filepath",row.names=FALSE). However after I opened this in excel I got some ##### in my excel file:
The only difference between the tab del file and the excel file is that in the excel file the . is omitted, but I don't have any idea how this is possible because most of the other numbers are just fine. Any suggestion to fix this problem is welcome.
Update
I can fit the data in the column:
However there should be a . after the 1

Probably your import settings are wrong regarding the seperation for thousands and decimals. Notice that the problem arises when the first number is >1. Excel interprets a number as a thousand if the first number is > 1 , because it woudln't make sense for excel to convert a number which begins with a 0 to a thousand. So you have to fix this:
You have to do this while importing the file in the last step, you have to click on Advanced and then set the Decimal seperator to: . and the Thousands seperator to: , (or visa versa, it's what you prefer offcourse but in your case it has to be this)

How to Write body in POST() in R Using "httr" Package [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
I got stuck figuring out the proper usage of POST in package "httr". I use this as following.
getdata <- POST(url=url,add_headers(`Content-Type`="application/json"),
body=list(user="xxx",password="xxx"))
The result is not right as it should be. I didn't good examples from the internet.
Even after I remove "add_headers", it is still not right.
getdata <- POST(url=url,body=list(user="xxx",password="xxx"))
The result is as following:
fromJSON(content(getdata,type="text"))
$success
[1] FALSE
BTW, can I add add_headers in POST? Thank you in advance.
Thank you guys. I found out the solution for my situation anyway.
Body can be in character format which is much easier than using the list format.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Remove re-occuring text strings [closed] - r

You need to use a non-greedy regular expression. Try AmericanTexts3<-gsub("Embed.*?})});\n","",AmericanTexts) The ? matches the first occurence of the second part of the regex, so that only the part between the matches should be removed.

Related

Find strings that start and end with certain characters [closed]

Can I group * things from this data? [closed]

How to move specific data from one column to another using r [closed]

Excel messes up some dots (".") in a number [closed]

How to Write body in POST() in R Using "httr" Package [closed]

Categories

Resources