Extract date from URL link / random string [closed] - r

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
I would like to extract dates from a column of URL links (5,000 rows of raw data).
Sample of the URL include:
http://en/Pages/Introduction-More_Details-20191103.com
http://en/Pages/United-Kingdom-Page1-EU-20190502.com
http://en/Pages/France-2019-Description-20190612.com
http://en/Pages/telephone-in-the-UK-and-USA-190405.com
Is there any R code that can learn the pattern and extract the date to another column?
Thank you.
The different length of text can be a problem...

At least from your sample it looks like the dates are the only numbers and they always follow a -. You could catch them with regex:
urls <- c('http://en/Pages/Introduction-More_Details-20191103.com',
'http://en/Pages/United-Kingdom-EU-20190502.com',
'http://en/Pages/France-20190612.com',
'http://en/Pages/telephone-in-the-UK-and-USA-190405.com')
gsub('(.*)-(\\d{6,8})(.*)', '\\2', urls)
#[1] "20191103" "20190502" "20190612" "190405"
Or
gsub('(.*)-(\\d{6,8})(\\.com)', '\\2', urls)
Then you save that to a new column. Obviously, how easy it is to pick up all the urls depends on how many different formats you have.

Related

Is there an R function to run the same filter command on all of my columns? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I have an excel database with around 250 objects (names of different people), and I would like to know if there´s a function to perform the same command on all of my objects, I have been using the function grep() with each individual name, but i would like to obtain the urls for each individual name without having to do it manually, is there an easier way of doing it?
enter image description here
`Alejandro Díaz Domínguez` [grep(".gob.mx", `Alejandro Díaz Domínguez`)]
[1] "http://www.csg.gob.mx
[2] "http://www.csg.gob.mx
[3] "https://sic.gob.mx
If your pattern is ".gob.mx" for all columns and every column has a person, you may want to use lapply().
lapply(your_dataframe, function(x) x[grep(".gob.mx", x)])

In R, is there a way to use regular expressions or something similar to extract the first and last character of an email string? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
Currently in R, with data.table, I have the following column:
jamesmann#yahoo.com
bill.free#yahoo.com
computer.trader#yahoo.com
j*****n#gmail.com
which are factors. I would like to parse the above so that I can get the first and last letters of the username before the # symbol.
So for the above I'd like to get:
jn
be
cr
jn
I deal with some asterisked usernames so I added it in too. Is there a simple way to do this? Any thoughts would be greatly appreciated.
Match the following pattern to the strings and replace it with the capture groups:
sub("(.).*(.)#.*", "\\1\\2", s)
## [1] "jn" "be" "cr" "jn"
Note
The input strings in reproducible form is:
s <- c("jamesmann#yahoo.com", "bill.free#yahoo.com", "computer.trader#yahoo.com",
"j*****n#gmail.com")

Find strings that start and end with certain characters [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
I'm working on a text-mining project with data from twitter. In my data frame, many words are converted into Unicode characters, e.g.
<U+0E2B><U+0E25><U+0E07><U+0E1E>
I want to collect every converted words like above and put them into 1 large string so I can deal with them separately.
Is there any way I can find all the strings that start with <U+ and end with > using R?
Your request is a bit imprecise, so I'm taking the liberty to make a few assumptions on how you want the output.
text <- "Words <Q+0E2B><U+0E2B2>, 1 < 2, <p>
<U+0E2B><U+0E25><U+0E07><U+0E1E> </p> some more words"
regmatches(text, gregexpr("<U\\+[0-9A-Z]{4}>", text))
# "<U+0E2B>" "<U+0E25>" "<U+0E07>" "<U+0E1E>"

In R, find value in one CSV, isolate it in another [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I'm working on a project in R, regarding baseball. I have two CSV's that I'm working with. One file, CSV2: "PitchingPost.csv" is all postseason pitching stats, and the column I'm looking at there is the "teamID". I'm trying to evaluate regular season pitching stats in another file, CSV1: "pitching.csv" but only for teams that made the postseason. So I'm trying to remove all of the items in the "teamID" of CSV1 EXCEPT for those occur in CSV2 "teamID".
Help?
To keep only the rows from your first file that share an ID with rows in your second file, you could try something like that:
pitch <- read.csv("pitching.csv")
pitch_post <- read.csv("PitchingPost.csv")
pitch <- pitch[pitch$teamID %in% unique(pitch_post$teamID),]

Why does data get altered while applying a function [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I loaded a RDS file. The file contains a numeric field. When I say
class(NEI$Emissions)
it returns
"numeric"
The data is in maximum 3 digits and contains 3 digits of decimal. However, when I issue the command
max(NEI$Emissions)
it returns a huge number.
646952
How can I use the numeric values as it is?
R doesn't lie. One of your data points is not what you expect.
Find which row has the problem with this command:
which.max(NEI$Emissions)
then examine that row of your original data. You will find the errant value.

Resources