R connecting R to twitter for sentiment analysis - r

I refered to the link given below for doing sentiment analysis
http://heuristically.wordpress.com/2011/04/08/text-data-mining-twitter-r/
And when I ran the code that is given below :`for (page in c(1:15)){
# search parameter
twitter_q <- URLencode('#prolife OR #prochoice')
twitter_url =
# fetch remote URL and parse
mydata.xml <- xmlParseDoc(twitter_url, asText=F)
# extract the titles
mydata.vector <- xpathSApply(mydata.xml, '//s:entry/s:title', xmlValue, namespaces =c('s'='http://www.w3.org/2005/Atom'))
# aggregate new tweets with previous tweets
mydata.vectors <- c(mydata.vector, mydata.vectors)
}
After running the code it is prompting me for an error
Error:Error in UseMethod("xpathApply") :
no applicable method for 'xpathApply' applied to an object of class "NULL"
I/O warning : failed to load HTTP resource
I installed the packages Roath,stringr,XML,plyr which was required.And I am using R Ver 3.0.3
Kindly help me out pleas how to go about it . I am struggling for this . It would be a great help if anyone guides me properly in right direction.

Related

R packages not connecting to peer for data download

I am working with a couple of R packages for genetic pathway enrichment analyses and the two packages that I am using are now throwing errors when trying to connect to each package's respective server for downloading the reference data for the analysis.
In the first package gage, I am getting the following error when attempting to download:
library(gage)
> kg.ko = kegg.gsets("ko") # ("ko" is KEGG ortholog pathway)
Error in curl::curl_fetch_memory(url, handle = handle) :
Failure when receiving data from the peer
In the second package clusterProfiler, I am getting the following error:
library(clusterProfiler)
# the data
dput(head(de_kegg_chr))
c("K14847", "K19009", "K00078", "K21407", "K23285", "K06972")
# KEGG enrichment (which will pull relevant reference data during this step)
# over-representation analysis (fisher's)
> enrich <- enrichKEGG(gene = de_kegg_chr,
+ organism = "ko",
+ keyType='kegg',
+ pvalueCutoff = 0.01)
Reading KEGG annotation online:
fail to download KEGG data...
Error in download.KEGG.Path(species) :
'species' should be one of organisms listed in 'http://www.genome.jp/kegg/catalog/org_list.html'...
In addition: Warning message:
In utils::download.file(url, quiet = TRUE, method = method, ...) :
URL 'https://rest.kegg.jp/link/ko/pathway': status was 'Failure when receiving data from the peer'
After the first error, I thought it was something specific to the gage package and found a simple work-around because these data are downloaded from the server prior to the analysis function.
This is more of a problem with the second package because the reference data are downloaded within the function that conducts the analysis.
Now that this is happening with more than one package (both of these scripts were working perfectly before yesterday), I'm thinking it is something systematic within R or R studio.

Problems parsing StreamR JSON data

I am attempting to use the streamR in R to download and analyze Twitter, under the pretense that this library can overcome the limitations from the twitteR package.
When downloading data everything seems to work fabulously, using the filterStream function (just to clarify, the function captures Twitter data, just running it will provide the json file -saved in the working directory- that needs to be used in further steps):
filterStream( file.name="tweets_test.json",
track="NFL", tweets=20, oauth=credential, timeout=10)
Capturing tweets...
Connection to Twitter stream was closed after 10 seconds with up to 21 tweets downloaded.
However, when moving on to parse the json file, I keep getting all sorts of errors:
readTweets("tweets_test.json", verbose = TRUE)
0 tweets have been parsed.
list()
Warning message:
In readLines(tweets) : incomplete final line found on 'tweets_test.json'
Or with this function from the same package:
tweet_df <- parseTweets(tweets='tweets_test.json')
Error in `$<-.data.frame`(`*tmp*`, "country_code", value = NA) :
replacement has 1 row, data has 0
In addition: Warning message:
In stream_in_int(path.expand(path)) : Parsing error on line 0
I have tried reading the json file with jsonlite and rjson with the same results.
Originally, it seemed that the error came from special characters ({, then \) within the json file that I tried to clean up following the suggestion from this post, however, not much came out of it.
I found out about the streamR package from this post, which shows the process as very straight forward and simple (which it is, except for the parsing part!).
If any of you have experience with this library and/or these parsing issues, I'd really appreciate your input. I have been searching non stop but haven't been able to locate a solution.
Thanks!

acs package in R: Cannot download dataset, error message is inscrutable

I am trying to use the acs package in R to download Census data for a basic map, but I am unable to download the data and I'm receiving a confusing error message.
My code is as follows:
#Including all packages here in case this is somehow the issue
install.packages(c("choroplethr", "choroplethrMaps", "tidycensus", "tigris", "leaflet", "acs", "sf"))
library(choroplethr)
library(choroplethrMaps)
library(tidycensus)
library(tigris)
library(leaflet)
library(acs)
library(sf)
library(tidyverse)
api.key.install("my_api_key")
SD_geo <- geo.make(state="CA", county = 73, tract = "*", block.group = "*")
median_income <- acs.fetch(endyear = 2015, span = 5, geography = SD_geo, table.number = "B19013", col.names="pretty")
Everything appears to work until the final command, when I receive the following error message:
trying URL 'http://web.mit.edu/eglenn/www/acs/acs-variables/acs_5yr_2015_var.xml.gz'
Content type 'application/xml' length 735879 bytes (718 KB)
downloaded 718 KB
Error in if (url.test["statusMessage"] != "OK") { :
missing value where TRUE/FALSE needed
In addition: Warning message:
In (function (endyear, span = 5, dataset = "acs", keyword, table.name, :
XML variable lookup tables for this request
seem to be missing from ' https://api.census.gov/data/2015/acs5/variables.xml ';
temporarily downloading and using archived copies instead;
since this is *much* slower, recommend running
acs.tables.install()
This is puzzling to me because 1) it appears like something is in fact being downloaded at first? and 2) 'Error in if (url.test["statusMessage"] != "OK") { :
missing value where TRUE/FALSE needed' makes no sense to me. It doesn't align with any of the arguments in the function.
I have tried:
Downloading the tables using acs.tables.install() as recommended in the second half of the error message. Doesn't help.
Changing the endyear and span to be sure that I'm falling within the years of data supported by the API. I seem to be, according to the API documentation. Have also used the package default arguments with no luck.
Using 'variable =' and the code for the variable as found in the official API documentation. This returns only the two lines with the mysterious "Error in if..." message.
Removing colnames = "pretty"
I'm going to just download the datafile as a CSV and read it into R for now, but I'd like to be able to perform this function from the script for future maps. Any information on what's going on here would be appreciated. I am running R version 3.3.2. Also, I'm new to using this package and the API. But I'm following the documentation and can't find evidence that I'm doing anything wrong.
Tutorial I am working off of:
http://zevross.com/blog/2015/10/14/manipulating-and-mapping-us-census-data-in-r-using-the-acs-tigris-and-leaflet-packages-3/#get-the-tabular-data-acs
And documentation of the acs package: http://eglenn.scripts.mit.edu/citystate/wp-content/uploads/2013/02/wpid-working_with_acs_R2.pdf
To follow up on Brandon's comment, version 2.1.1 of the package is now on CRAN, which should resolve this issue.
Your code runs for me. My guess would be that the Census API was temporarily down.
As you loaded tidycensus and you'd like to do some mapping, you might also consider the following code:
library(tidycensus)
census_api_key("your key here") # use `install = TRUE` to install the key
options(tigris_use_cache = TRUE) # optional - to cache the Census shapefile
median_income <- get_acs(geography = "block group",
variables = "B19013_001",
state = "CA", county = "San Diego",
geometry = TRUE)
This will get you the data you need, along with feature geometry for mapping, as a tidy data frame.
I emailed Ezra Haber Glenn, the author of the package, about this as I was having the same issue. I received a response within 30 minutes and it was after midnight, which I thought was amazing. Long story short, the acs package version 2.1.0 is configured to work with the changes the Census Bureau is making to their API later this summer, and it is currently presenting some problems windows users in the mean time. Ezra is going to be releasing an update with a fix, but in the mean time I reverted back to version 2.0 and it works fine. I'm sure there are a few ways to do this, but I installed the devtools package and ran:
require(devtools)
install_version("acs", version = "2.0", repos = "http://cran.us.r-project.org")
Hope this helps anyone else having a similar issue.

R - Read data from HTML table

I'm trying to execute an example from book "Practical Data Science Cookbook"
the code as following :
year <- 2013
#Acquire offense data
url <- paste("http://sports.yahoo.com/nfl/stats/byteam? group=Offense&cat=Total&conference=NFL&year=season_",
year,"&sort=530&old_category=Total&old_group=Offense")
offense <- readHTMLTable(url, encoding = "UTF-8", colClasses="character")[[7]]
and getting error :
Error in UseMethod("xmlNamespaceDefinitions") :
no applicable method for 'xmlNamespaceDefinitions' applied to an object of class "NULL"
Please help
To solve the problem need to configure http proxy .
On Windows desktop edit R-Studio shortcut , add after R-Studio name
proxy definitions
http_proxy=http://user_id:passwod#your_proxy:your_port/
source: Proxy settings for R

Using R package BerkeleyEarth

I'm working for the first time with the R package BerkeleyEarth, and attempting to use its convenience functions to access the BEST data. I think maybe it's just a problem with their servers (a matter I've separately addressed to the package's maintainer) but I wanted to know if it's instead something silly I'm doing.
To reproduce my fault
library(BerkeleyEarth)
downloadBerkeley()
which provides the following error message
trying URL 'http://download.berkeleyearth.org/downloads/TAVG/LATEST%20-%20Non-seasonal%20_%20Quality%20Controlled.zip'
Error in download.file(urls$Url[thisUrl], destfile = file.path(destDir, :
cannot open URL 'http://download.berkeleyearth.org/downloads/TAVG/LATEST%20-%20Non-seasonal%20_%20Quality%20Controlled.zip'
In addition: Warning message:
In download.file(urls$Url[thisUrl], destfile = file.path(destDir, :
InternetOpenUrl failed: 'A connection with the server could not be established'
Has anyone had a better experience using this package?
The error message is pointing to a different URL than one should get judging what URLs are listed at http://berkeleyearth.org/data/ that point to the zip formatted files. There are another set of .nc files that appear to be more recent. I would replace the entries in the BerkeleyUrls dataframe with the ones that match your analysis strategy:
This is the current URL that should be in position 1,1:
http://berkeleyearth.lbl.gov/downloads/TAVG/LATEST%20-%20Non-seasonal%20_%20Quality%20Controlled.zip
And this is the one that is in the package dataframe:
> BerkeleyUrls[1,1]
[1] "http://download.berkeleyearth.org/downloads/TAVG/LATEST%20-%20Non-seasonal%20_%20Quality%20Controlled.zip"
I suppose you could try:
BerkeleyUrls[, 1] <- sub( "download\\.berkeleyearth\\.org", "berkeleyearth.lbl.gov", BerkeleyUrls[, 1])

Resources