Please find below my code that I am using to share my analysis (dataframe) with my friend in R. I am using sendmailR package and pander:
library(sendmailR)
from <- "<me#gmail.com>"
to <- "<friend#gmail.com>"
subject <- "Important Report of the Day!!"
body <- "This is the result of the test:"
mailControl=list(smtpServer="ASPMX.L.GOOGLE.COM")
#-----------------------------------------------------
msg_content <- mime_part(paste('<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0
Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0"/>
</head>
<body><pre>', paste(pander_return(pander(vvv, style="multiline")), collapse = '\n'), '</pre></body>
</html>'))
msg_content[["headers"]][["Content-Type"]] <- "text/html"
sendmail(from=from,to=to,subject=subject,msg=msg_content,control=mailControl)
Problem is that in the mail the table is broken into two parts (8 column table and 4 column table) PFB the sample picture
How do I change my code so that my table of 12 columns remain intact.
After adding this line
panderOptions('table.split.table', Inf)
This is the email that I am getting enter image description here
You have to increase or disable the default max width of the resulting markdown table via the split.tables argument of pandoc.table (that can be also used with the pander call, which will pass that argument to pandoc.table after all) or update the global options via panderOptions.
Quick example on updating your pander call:
paste(pander_return(pander(vvv, split.tables = Inf)), collapse = '\n')
Or set that globally for all future pander calls:
panderOptions('table.split.table', Inf)
Related
For school project.
In RStudio version 2022.02.1+461, I'm trying to download a csv file from GitHub, then import to data frame.
Here's what's in the chunk:
url <- "https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv"
DF_csv <- read.csv(confirmed_url, stringsAsFactors = FALSE)
head(DF_csv)
X..DOCTYPE.html.
1 <html lang=en data-color-mode=auto data-light-theme=light data-dark-theme=dark >
2 <head>
3 <meta charset=utf-8>
4 <link rel=dns-prefetch href=https://github.githubassets.com>
5 <link rel=dns-prefetch href=https://avatars.githubusercontent.com>
6 <link rel=dns-prefetch href=https://github-cloud.s3.amazonaws.com>
No error messages during rendering. I've also checked the tail. Not workable. The data frame has 431 obs. with 1 variable.
I've tried to check the package versions, IDE versions, I've tried curl with read.csv(), same thing. Apparently, it's a time series, so I was expecting some sort of numerical data for analysis.
How do I fix this?
I can't get (web scrape) html tree content with R function xmlTreeParse - I mean common page with products.
I get library Rcurl and XML.
myurln3<-"www.amazon.com/s?k=router+hand+plane+cheap&i=arts-crafts-intl-ship&ref=nb_sb_noss"
html_page<-xmlTreeParse(myurln3, useInternalNodes = TRUE)
Error: XML content does not seem to be XML:
'www.amazon.com/s?k=router+hand+plane+cheap&i=arts-crafts-intl-ship&ref=nb_sb_noss'
I expect to scrape page and get full html structure.
I back after some other projects to web scraping with R and still with problems.
> library(XML)
Warning message:
XML package is in R 3.5.3 version
> my_url99 <- "https://www.amazon.com/s?k=Dell+laptop+windows+10&ref=nb_sb_noss_2"
> html_page99 <- htmlTreeParse(my_url99, useInternalNode=TRUE)
Warning message:
XML content does not seem to be XML: 'https://www.amazon.com/s?k=Dell+laptop+windows+10&ref=nb_sb_noss_2'
> head(html_page99)
Error in `[.XMLInternalDocument`(x, seq_len(n)) :
No method for subsetting an XMLInternalDocument with integer
> html_page99
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><p>https://www.amazon.com/s?k=Dell+laptop+windows+10&ref=nb_sb_noss_2</p></body></html>
But I need to scrape above page with full content = I mean content with $ sign on the left (fmaybe that's not the best direct description) and all the tags.
For instance I want scrape flight data for flights operating between Chicago (ORD) and NewDelhi (DEL). I would search for the flights on makemytrip and this is teh URL that gets generated - http://us.makemytrip.com/international/listing/exUs/RT/ORD-DEL-D-22May2016_JFK-DEL-D-25May2016/A-1/E?userID=90281463121653408
When I am trying to read this HTML page using rvest package, this is what I get -
htmlpage<-read_html("http://us.makemytrip.com/international/listing/exUs/RT/ORD-DEL-D-22May2016_JFK-DEL-D-25May2016/A-1/E?userID=90281463121653408")
htmlpage
{xml_document}
<html>
[1] <head>\n <meta http-equiv="Content-Type" cont ...
[2] <body onload="done_loading();">\n\n <div id= ...
myhtml<-html_nodes(htmlpage,".flight_info")
> myhtml
{xml_nodeset (0)}
Need help on parsing/scraping this data and understand what is going on wrong here.
Thanks !
I'm working with the 'pander' and 'sendmailr' packages to send a small data frame in the body of an email, rather than as an attachment. I'd like to send it from and to a gmail account.
I'm close, but the column headers won't align with the columns themselves in the email body the way they do in Rstudio for example- basically the column headers are too wide to line up with the data columns below them.
It seems the problem is the way the dashes and whitespaces are compressed in various email clients (I tried this in gmail, yahoo and hotmail through the web and through the email client that ships with OS X Mavericks). I was able to remedy the problem in my OS X email client by going to 'preferences' and checking the box labeled 'use fixed-width font for plain-text messages' but I'd like it to work on multiple devices, with multiple clients, etc for many of my coworkers so I'm wondering if there's a way that doesn't involve global email settings.
Here is the code to reproduce the problem:
library(sendmailR) # for emails from R
library(pander) # for table-formatting that does not require HTML
results <- head(iris)
pander(results) # widths look great so far...
a = pandoc.table.return(results)
strsplit(a, "\n") # widths still look great...
panderOptions('table.split.table', Inf) # show all columns on same line
msg_content <- mime_part(
pandoc.table.return(results, style = "multiline")
)
# I'm using my own gmail address for email_from and email_to
sendmail(from = email_from,
to = email_to,
subject = "test",
msg = msg_content
)
… and the email received has the problem described above.
Next you can see an image which illustrates the problem:
The problem with plain text e-mails and using markdown tables is that the e-mail client usually displays the text with a non-fixed font, and you have to use custom settings in all your e-mail client to override that (like you did with your OS X e-mail client). On the other hand, that's why HTML mails are trending :)
So let's create a HTML mail and include the markdown table in a pre block:
msg_content <- mime_part(paste('<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0
Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0"/>
</head>
<body><pre>', paste(pander.return(results, style = "multiline"), collapse = '\n'), '</pre></body>
</html>'))
Due to a bug in sendmailR, we have to override the Content-type to HTML:
msg_content[["headers"]][["Content-Type"]] <- "text/html"
And now it's ready to be sent via the comment you used in your example, resulting in:
The table should look similarly fine in any other HTML-capable e-mail client. Please note that this way you could also use HTML tables instead of markdown if that would fit your needs better.
I am gathering data about different universities and I have a question about the follow error after executing the following code. The problem is when using htmlParse()
Code:
url1 <- "http://nces.ed.gov/collegenavigator/?id=165015"
webpage1<- getURL(url1)
doc1 <- htmlParse(webpage1)
Output:
Error in htmlParse(webpage1) : File
!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"
html xmlns="http://www.w3.org/1999/xhtml" head id="ctl00_hd"meta http-equiv="Content-type" content="text/html;charset=UTF-8" /title
College Navigator - National Center for Education Statistics
/titlelink href="css/md0.css" type="text/css" rel="stylesheet" meta name="keywords" content="college navigator,college search,postsecondary education,postsecondary statistics,NCES,IPEDS,college locator"/meta meta name="description" content="College Navigator is a free consumer information tool designed to help students, parents, high school counselors, and others get information about over 7,000 postsecondary institutions in the United States - such as programs offered, retention and graduation rates, prices, aid available, degrees awarded, campus safety, and accreditation."meta>meta name="robots" content="index,nofollow"/metalink
I have webs scraped pages before using this package and I never had an issue. Does the name="robots" have anything to do with it? Any help would be greatly appreciate.
http://validator.w3.org/check?verbose=1&uri=http%3A%2F%2Fnces.ed.gov%2Fcollegenavigator%2F%3Fid%3D165015
indicates the webpage is badly formed. Your browser can compensate for this but your R package is having problems.
if you are using windows you can get the IE browser to fix it for you as follows:
library(rcom)
library(XML)
ie = comCreateObject('InternetExplorer.Application')
ie[["visible"]]=T # true for debugging
comInvoke(ie,"Navigate2","http://nces.ed.gov/collegenavigator/?id=165015")
while(comGetProperty(ie,"busy")||comGetProperty(ie,"ReadyState")<4){
Sys.sleep(1)
print(comGetProperty(ie,"ReadyState"))
}
myDoc<-comGetProperty(ie,"Document")
webpage1<-myDoc$getElementsByTagName('html')[[0]][['innerHTML']]
ie$Quit()
doc1 <- htmlParse(webpage1)