How to remove ID from write.table CSV output [duplicate] - r

This question already has an answer here:
write.table extra column
(1 answer)
Closed 8 years ago.
When I execute this statement:
write.table(bigdata, 'news.csv', sep = ',')
It outputs like this:
"type","text"
"1","neutral","The week in 32 photos"
"2","neutral","Look at me! 22 selfies of the week"
"3","neutral","Inside rebel tunnels in Homs"
"4","neutral","Voices from Ukraine"
"5","neutral","Water dries up ahead of World Cup"
"6","positive","Who's your hero? Nominate them"
However, I don't want that ID column, the numbers that appear like 1,2,3,4... I just want this:
"type","text"
"neutral","The week in 32 photos"
"neutral","Look at me! 22 selfies of the week"
"neutral","Inside rebel tunnels in Homs"
"neutral","Voices from Ukraine"
"neutral","Water dries up ahead of World Cup"
"positive","Who's your hero? Nominate them"
Here is my dataframe:
> head(bigdata)
type text
1 neutral The week in 32 photos
2 neutral Look at me! 22 selfies of the week
3 neutral Inside rebel tunnels in Homs
4 neutral Voices from Ukraine
5 neutral Water dries up ahead of World Cup
6 positive Who's your hero? Nominate them
How can I remove the ID from the output?

Specifying row.names = FALSE in the write.table() function will remove that id column.

Related

Duplicate row and string manipulation in R

I have a dataframe in R which has some rows as follows:
c("LouDobbs", "gen_jackkeane") || RT #LouDobbs: #AmericaFirst- #gen_jackkeane: The Taliban for 9 months have told their fighters to kill as many people as you can, to includ…
above is an example of 2 columns where column 1 (I am using separator ||) has more than one username and column 2 has the tweet text. I want that this row should be duplicated into 2 (number of users) and each individual user singly can be placed in column 1 for all such rows in the data frame where more than 1 user is listed against the tweet text.
structure(list(user = list("Dandhy_Laksono", c("LouDobbs", "gen_jackkeane"
), "DeepStateExpose", "AndruewJamess", "jrossman12", "BiLLRaY2019",
"DeepStateExpose", "Dandhy_Laksono", "DeepStateExpose", "DeepStateExpose"),
full_text = c("RT #Dandhy_Laksono: Sebagian pendukung Jokowi ini mengalami bagaimana fitnah \"komunis dan PKI\" digunakan selama pemilu.\n\nSekarang mereka me…",
"RT #LouDobbs: #AmericaFirst- #gen_jackkeane: The Taliban for 9 months have told their fighters to kill as many people as you can, to includ…",
"RT #DeepStateExpose: The Only Reason The Deep State Cabal Has Stayed in Afghanistan For 18 Years Is To Protect Their Largest Poppy/Opium/Na…",
"RT #AndruewJamess: #BillOReilly #KamalaHarris is wrong. #realDonaldTrump has accomplished a lot. He set a record for incoherent toilet twe…",
"RT #jrossman12: #SaraCarterDC Pakistan won't allow that as you already know. Your husband and the other U.S. troops have been forced to fig…",
"RT #BiLLRaY2019: JOKOWI TIDAK MEMBUNUH KPK..!\nMarkibong…\"Selamat tinggal Taliban di dalam KPK. Kalian kalah lagi, kalah lagi..!\"\n\n#JumatBer…",
"RT #DeepStateExpose: The Only Reason The Deep State Cabal Has Stayed in Afghanistan For 18 Years Is To Protect Their Largest Poppy/Opium/Na…",
"RT #Dandhy_Laksono: Sebagian pendukung Jokowi ini mengalami bagaimana fitnah \"komunis dan PKI\" digunakan selama pemilu.\n\nSekarang mereka me…",
"RT #DeepStateExpose: The Only Reason The Deep State Cabal Has Stayed in Afghanistan For 18 Years Is To Protect Their Largest Poppy/Opium/Na…",
"RT #DeepStateExpose: The Only Reason The Deep State Cabal Has Stayed in Afghanistan For 18 Years Is To Protect Their Largest Poppy/Opium/Na…"
)), row.names = c(NA, 10L), class = "data.frame")
We can use lengths to get the length of each of the elements of the list column. It should be fast enough as lengths is fast
l1 <- lengths(df$user)
out <- data.frame(user = unlist(df$user), n = rep(l1, l1),
text = rep(df$full_text, l1))

How do you replace words that repeat themselves one after another in R?

I want to substitute all the strings that have words that repeat themselves one after another with words that have a single occurrence.
My strings go something like that:
text_strings <- c("We have to extract these numbers 12, 47, 48", "The integers numbers are also interestings: 189 2036 314",
"','is a separator, so please extract these numbers 125,789,1450 and also these 564,90456", "We like to to offer you 7890$ per month in order to complete this task... we are joking", "You are going to learn 3 things, the first one is not to extract, and 2 and 3 are simply digits.", "Have fun with our mighty test, you are going to support science, progress, mankind wellness and you are going to waste 30 or 60 minutes of your life.", "you can also extract exotic stuff like a456 gb67 and 45678911ghth", "Writing 1 example is not funny, please consider that 66% is validation+testing", "You you are a genius, I think that you like arrays A LOT, [3,45,67,900,1974]", "Who loves arrays more than me?", "{366,78,90,5}Yes, there are only 4 numbers inside", "Integers are fine but sometimes you like 99 cents after the 99 dollars", "100€ are better than 99€", "I like to give you 1000 numbers now: 12 3 56 21 67, and more, [45,67,7]", "Ok ok 1 2 3 4 5 and the last one is 6", "33 trentini entrarono a Trento, tutti e 33 di tratto in tratto trotterellando")
I tried:
gsub("\b(?=\\w*(\\w)\1)\\w+", "\\w", text_strings, perl = TRUE)
But nothing happened (the output remained the same).
How can I remove the repeating words such as in
text_strings[9]
#[1] "You you are a genius, I think that you like arrays A LOT, [3,45,67,900,1974]"
Thank you!
You can use gsub and a regular expression.
gsub("\\b(\\w+)\\W+\\1", "\\1", text_strings, ignore.case=TRUE, perl=TRUE)
[1] "We have to extract these numbers 12, 47, 48"
[2] "The integers numbers are also interestings: 189 2036 314"
[3] "','is a separator, so please extract these numbers 125,789,1450 and also these 564,90456"
[4] "We like to offer you 7890$ per month in order to complete this task... we are joking"
[5] "You are going to learn 3 things, the first one is not to extract, and 2 and 3 are simply digits."
[6] "Have fun with our mighty test, you are going to support science, progress, mankind wellness and you are going to waste 30 or 60 minutes of your life."
[7] "you can also extract exotic stuff like a456 gb67 and 45678911ghth"
[8] "Writing 1 example is not funny, please consider that 66% is validation+testing"
[9] "You are a genius, I think that you like arrays A LOT, [3,45,67,900,1974]"
[10] "Who loves arrays more than me?"
[11] "{366,78,90,5}Yes, there are only 4 numbers inside"
[12] "Integers are fine but sometimes you like 99 cents after the 99 dollars"
[13] "100€ are better than 99€"
[14] "I like to give you 1000 numbers now: 12 3 56 21 67, and more, [45,67,7]"
[15] "Ok 1 2 3 4 5 and the last one is 6"
[16] "33 trentini entrarono a Trento, tutti e 33 di tratto in tratto trotterellando
"

how to extract text from anchor tag inside div class in r

I am trying to fetch text from anchor tag, which is embedded in div tag. Following is the link of website `http://mmb.moneycontrol.com/forum-topics/stocks-1.html
The text I want to extract is Mawana Sugars
Mawana Sugars
So I want to extract all the stocks names listed on this website and description of it.
Here is my attempt to do it in R
doc <- htmlParse("http://mmb.moneycontrol.com/forum-topics/stocks-1.html")
xpathSApply(doc,"//div[#class='clearfix PR PB5']//text()",xmlValue)
But, it does not return anything. How can I do it in R?
My answer is essentially the same as the one I just gave here.
The data is dynamically loaded, and cannot be retrieved directly from the html. But, looking at "Network" in Chrome DevTools for instance, we can find a nicely formatted JSON at http://mmb.moneycontrol.com/index.php?q=topic/ajax_call&section=get_messages&offset=&lmid=&isp=0&gmt=cat_lm&catid=1&pgno=1
To get you started:
library(jsonlite)
dat <- fromJSON("http://mmb.moneycontrol.com/index.php?q=topic/ajax_call&section=get_messages&offset=&lmid=&isp=0&gmt=cat_lm&catid=1&pgno=1")
Output looks like:
dat[1:3, c("msg_id", "user_id", "topic", "heading", "flag", "price", "message")]
# msg_id user_id topic heading flag
# 1 47730730 liontrade NMDC Stocks APR
# 2 47730726 agrawalknath Glenmark Glenmark APR
# 3 47730725 bissy91 Infosys Stocks APR
# price
# 1 Price when posted : BSE: Rs. 127.90 NSE: Rs. 128.15
# 2 Price when posted : NSE: Rs. 714.10
# 3 Price when posted : BSE: Rs. 956.50 NSE: Rs. 955.00
# message
# 1 There is no mention of dividend in the announcement.
# 2 Eagerly Waiting for 670 to 675 to BUY second phase of Buying in Cash Delivery. Already Holding # 800.
# 3 6 ✂ ✂--Don t Pay High Brokerage While Trading. Take Delivery Free & Rs 20 to trade in any size - Join Today .👉 goo.gl/hDqLnm

how to extract word from row

I have a 46 MB csv file containing data. Essentially, I would like select only those rows that have particular word like "PRODUCT". There are 600 000 rows for this data. I have used grep() to search for the string matching. Following are few lines of my data.
head(test)
Item.Description UQC Year
1 PHARMACEUTICALS PRODUCTS.(MEDICINE) DOLEYKA SYRUP 100 ML NOS 2015
2 Multani mati hesh100gm x 160 (AyurvedicProducts) PAC 2015
3 Amla /Shikakai/ Aritha powder 100gm x 160 (Ayurvedic Products) PAC 2015
4 Godrej h.dye blk 40ml x 36 (Ayurvedic Products) PAC 2015
5 DR. COOLERS HERBAL LOZENGES.(2) DR. COOLERS HERBAL LOZENGES (MINT FLAVOUR) PAC 2015
6 Eno lemon/ regular 100gm x 48 (AyurvedicProducts) PAC 2015
Identifier RITC.Code
30049099
30049011
30049011
30049011
30049011
30049011
I have used test[grep("PRODUCT", rownames(test)), ]. It gives me an error.
1)try grepl, it works better.
2)The upper/lower case matters here, and you have both of them in your text.
So try:
1) test$Item.Description <- tolower(test$Item.Description)
2) products <- test[grepl("product", test$Item.Description),].
And yes, the usage of the needed column (ItemDescription) instead of rownames matters too
open csv file using ms-excel
go to menu 'data' and click 'filter'
in filter drop down select 'Text Filters' then select 'contains'
then type word 'product'
list contains word 'product' will be filtered

Getting a list of Means by a parameter R

I have data that looks like this:
> head(chf)
Admit.Day.of.Week Type.of.Admission Patient.Disposition
1 SAT Emergency Skilled Nursing Home
2 FRI Elective Home or Self Care
3 FRI Emergency Home w/ Home Health Services
4 MON Emergency Skilled Nursing Home
5 THU Emergency Home or Self Care
6 WED Emergency Skilled Nursing Home
mean_los_dispo
1 8.553525
2 4.224193
3 5.789052
4 8.553525
5 4.224193
6 8.553525
I use the following command to get the column labled mean_los_dispo
# Mean LOS for each patient disposition
chf$mean_los_dispo <- ave(chf$Length.of.Stay, chf$Patient.Disposition,
FUN = mean)
What I want to do is set a variable to hold the value of the mean_los_dispo for each of the four different dispositions, for example
SNH = 8.553525
HSC = 4.224193
...
How would I go about doing this? I want to be able to eventually use paste or something similar to put the information in the title of a graph.
You can use paste. So for example, I created two variables, one with numbers (so your means) and another with characters (so your dispositions), and then I used paste to concatenate them:
a<-c(1,2,3,4,5)
b<-c("a","b","c","d","e")
strs<-paste(b," = ",as.character(a),sep="")
This produces:
[1] "a = 1" "b = 2" "c = 3" "d = 4" "e = 5"
In your case you could do something like the following:
unique(paste(chf$Patient.Disposition," = ",as.character(chf$mean_los_dispo),sep=""))
The unique will get rid of all of the duplicates.

Resources