searchTwitter timestamps - r

I am using the twitteR library in R and wondering if is is possible to get timestamps associated with a search or a timeline for that matter. E.G if searching #rstats using searchTwitter, I would like to know when the tweets were made...are there additional parameters I need to parse in order to get that information?
here is some example code...
library(twitteR)
searchTwitter("#rstats",n=10)
giving the following result
[[1]]
[1] "MinneAnalytics: #thomaswdinsmore RT #erikriverson: Some thoughts from an observer on the #Rstats track at #BigDataMN. http://t.co/i42PEQHz #R at #CSOM"
[[2]]
[1] "pentalibra: My package ggdendro to draw dendrograms with ggplot2 is back on CRAN. http://t.co/gMviOSnQ Wait a day or so for Windows binary/ #rstats"
[[3]]
[1] "Lachamadice: RT #freakonometrics: \"Regression tree using Gini's index\" http://t.co/tUplMqQj with #rstats"
[[4]]
[1] "Rbloggers: Tracking Number of Historical Clusters: \n(This article was first published on Systematic Investor » R,... http://t.co/jRnWUQ2Y #rstats"
[[5]]
[1] "Rbloggers: ggplot2 multiple boxplots with metadata: \n(This article was first published on mintgene » R, and kindl... http://t.co/re2gghTx #rstats"
[[6]]
[1] "Rbloggers: Learning R using a Chemical Reaction Engineering Book: Part 3: \n(This article was first published on N... http://t.co/agCJi9Rr #rstats"
[[7]]
[1] "Rbloggers: Learning R using a Chemical Reaction Engineering Book: Part 2: \n(This article was first published on N... http://t.co/2qqpgQrq #rstats"
[[8]]
[1] "Rbloggers: Waiting for an API request to complete: \n(This article was first published on Recology - R, and kindly... http://t.co/MZzxHVdw #rstats"
[[9]]
[1] "heidelqekhse3: RT #geospacedman: Just got an openlayers map working on an #rstats #shiny app at #nhshd but... meh."
[[10]]
[1] "jveik: Slides and replay of “Using R with Hadoop” webinar now available #rstats #hadoop | #scoopit http://t.co/Ar2F7We3"

after a google:
mytweet <- searchTwitter("#chocolate",n=10)
str(mytweet[[1]])
Reference class 'status' [package "twitteR"] with 10 fields
$ text : chr "The #chocolate part of the #croquette. #dumplings #truffles http://t.co/Imwt3tTP"
$ favorited : logi FALSE
$ replyToSN : chr(0)
$ created : POSIXct[1:1], format: "2013-01-27 16:26:03"
$ truncated : logi FALSE
$ replyToSID : chr(0)
$ id : chr "295568362526896128"
$ replyToUID : chr(0)
$ statusSource: chr "<a href="http://instagr.am">Instagram</a>"
$ screenName : chr "tahiatmahboob"
and 33 methods, of which 22 are possibly relevant:
getCreated, getFavorited, getId, getReplyToSID, getReplyToSN, getReplyToUID, getScreenName, getStatusSource, getText, getTruncated,
initialize, setCreated, setFavorited, setId, setReplyToSID, setReplyToSN, setReplyToUID, setScreenName, setStatusSource, setText,
setTruncated, toDataFrame
So time stamp is:
mytweet[[1]]$created
[1] "2013-01-27 16:26:03 UTC"
Never used twitteR until I read your question. Seems like something fun to do when bored.

One alterantive to parse the result( as the answer above) , is to use argument since and until.
For example you can do :
res <- searchTwitter("#rstats",n=1000,since='2013-01-24',
until='2013-01-28')
The searchTwitter is a wrapper to the JSON API of twitter. Take a look here for more details of the argument and example of the JSON results.

Related

Using read_html in R to get Russell 3000 holdings?

I was wondering if there is a way to automatically pull the Russell 3000 holdings from the iShares website in R using the read_html (or rvest) function?
url: https://www.ishares.com/us/products/239714/ishares-russell-3000-etf
(all holdings in the table on the bottom, not just top 10)
So far I have had to copy and paste into an Excel document, save as a CSV, and use read_csv to create a tibble in R of the ticker, company name, and sector.
I have used read_html to pull the SP500 holdings from wikipedia, but can't seem to figure out the path I need to put in to have R automatically pull from iShares website (and there arent other reputable websites I've found with all ~3000 holdings). Here is the code used for SP500:
read_html("https://en.wikipedia.org/wiki/List_of_S%26P_500_companies")%>%
html_node("table.wikitable")%>%
html_table()%>%
select('Symbol','Security','GICS Sector','GICS Sub Industry')%>%
as_tibble()
First post, sorry if it is hard to follow...
Any help would be much appreciated
Michael
IMPORTANT
According to the Terms & Conditions listed on BlackRock's website (here):
Use any robot, spider, intelligent agent, other automatic device, or manual process to search, monitor or copy this Website or the reports, data, information, content, software, products services, or other materials on, generated by or obtained from this Website, whether through links or otherwise (collectively, "Materials"), without BlackRock's permission, provided that generally available third-party web browsers may be used without such permission;
I suggest you ensure you are abiding by those terms before using their data in a way that violates those rules. For educational purposes, here is how data would be obtained:
First you need to get to the actual data (not the interactive javascript). How familiar are you with the devloper function on your browser? If you navigate through the webiste and track the traffic, you will notice a large AJAX:
https://www.ishares.com/us/products/239714/ishares-russell-3000-etf/1467271812596.ajax?tab=all&fileType=json
This is the data you need (all). After locating this, it is just cleaning the data. Example:
library(jsonlite)
#Locate the raw data by searching the Network traffic:
url="https://www.ishares.com/us/products/239714/ishares-russell-3000-etf/1467271812596.ajax?tab=all&fileType=json"
#pull the data in via fromJSON
x<-jsonlite::fromJSON(url,flatten=TRUE)
>Large list (10.4 Mb)
#use a comination of `lapply` and `rapply` to unlist, structuring the results as one large list
y<-lapply(rapply(x, enquote, how="unlist"), eval)
>Large list (50677 elements, 6.9Mb)
y1<-y[1:15]
> str(y1)
List of 15
$ aaData1 : chr "MSFT"
$ aaData2 : chr "MICROSOFT CORP"
$ aaData3 : chr "Equity"
$ aaData.display: chr "2.95"
$ aaData.raw : num 2.95
$ aaData.display: chr "109.41"
$ aaData.raw : num 109
$ aaData.display: chr "2,615,449.00"
$ aaData.raw : int 2615449
$ aaData.display: chr "$286,156,275.09"
$ aaData.raw : num 2.86e+08
$ aaData.display: chr "286,156,275.09"
$ aaData.raw : num 2.86e+08
$ aaData14 : chr "Information Technology"
$ aaData15 : chr "2588173"
**Updated: In case you are unable to clean the data, here you are:
testdf<- data.frame(matrix(unlist(y), nrow=50677, byrow=T),stringsAsFactors=FALSE)
#Where we want to break the DF at (every nth row)
breaks <- 17
#number of rows in full DF
nbr.row <- nrow(testdf)
repeats<- rep(1:ceiling(nbr.row/breaks),each=breaks)[1:nbr.row]
#split DF from clean-up
newDF <- split(testdf,repeats)
Result:
> str(head(newDF))
List of 6
$ 1:'data.frame': 17 obs. of 1 variable:
..$ matrix.unlist.y...nrow...50677..byrow...T.: chr [1:17] "MSFT" "MICROSOFT CORP" "Equity" "2.95" ...
$ 2:'data.frame': 17 obs. of 1 variable:
..$ matrix.unlist.y...nrow...50677..byrow...T.: chr [1:17] "AAPL" "APPLE INC" "Equity" "2.89" ...
$ 3:'data.frame': 17 obs. of 1 variable:
..$ matrix.unlist.y...nrow...50677..byrow...T.: chr [1:17] "AMZN" "AMAZON COM INC" "Equity" "2.34" ...
$ 4:'data.frame': 17 obs. of 1 variable:
..$ matrix.unlist.y...nrow...50677..byrow...T.: chr [1:17] "BRKB" "BERKSHIRE HATHAWAY INC CLASS B" "Equity" "1.42" ...
$ 5:'data.frame': 17 obs. of 1 variable:
..$ matrix.unlist.y...nrow...50677..byrow...T.: chr [1:17] "FB" "FACEBOOK CLASS A INC" "Equity" "1.35" ...
$ 6:'data.frame': 17 obs. of 1 variable:
..$ matrix.unlist.y...nrow...50677..byrow...T.: chr [1:17] "JNJ" "JOHNSON & JOHNSON" "Equity" "1.29" ...

Getting {xml_nodeset (0)} when using html_nodes from rvest package in R

I am trying to scrape headlines off a few news websites using the html_node function and the SelectorGadget but find that some do not work giving the result "{xml_nodeset (0)}". For example the below code gives such result:
url_cnn = 'https://edition.cnn.com/'
webpage_cnn = read_html(url_cnn)
headlines_html_cnn = html_nodes(webpage_cnn,'.cd__headline-text')
headlines_html_cnn
The ".cd__headline-text" I got using the SelectorGadget.
Other websites work such as:
url_cnbc = 'https://www.cnbc.com/world/?region=world'
webpage_cnbc = read_html(url_cnbc)
headlines_html_cnbc = html_nodes(webpage_cnbc,'.headline')
headlines_html_cnbc
Gives a full set of headlines. Any ideas why some websites return the "{xml_nodeset (0)}" result?
Please, please, please stop using Selector Gadget. I know Hadley swears by it but he's 100% wrong. What you see with Selector Gadget is what's been created in the DOM after javascript has been executed and other resources have been loaded asynchronously. Please use "View Source". That's what you get when you use read_html().
Having said that, I'm impressed CNN is as generous as they are (you def can scrape this page) and the content is most certainly on that page, just not rendered (which is likely even better):
Now, that's javascript, not JSON so we'll need some help from the V8 package:
library(rvest)
library(V8)
ctx <- v8()
# get the page source
pg <- read_html("https://edition.cnn.com/")
# find the node with the data in a <script> tag
html_node(pg, xpath=".//script[contains(., 'var CNN = CNN || {};CNN.isWebview')]") %>%
html_text() %>% # get the plaintext
ctx$eval() # sent it to V8 to execute it
cnn <- ctx$get("CNN") # get the data ^^ just created
After exploring the cnn object:
str(cnn[["contentModel"]][["siblings"]][["articleList"]], 1)
## 'data.frame': 55 obs. of 7 variables:
## $ uri : chr "/2018/11/16/politics/cia-assessment-khashoggi-assassination-saudi-arabia/index.html" "/2018/11/16/politics/hunt-crown-prince-saudi-un-resolution/index.html" "/2018/11/15/politics/us-khashoggi-sanctions/index.html" "/2018/11/15/middleeast/jamal-khashoggi-saudi-prosecutor-death-penalty-intl/index.html" ...
## $ headline : chr "<strong>CIA determines Saudi Crown Prince personally ordered journalist's death, senior US official says</strong>" "Saudi crown prince's 'fit' over UN resolution" "US issues sanctions on 17 Saudis over Khashoggi murder" "Saudi prosecutor seeks death penalty for Khashoggi killers" ...
## $ thumbnail : chr "//cdn.cnn.com/cnnnext/dam/assets/181025083025-prince-mohammed-bin-salman-small-11.jpg" "//cdn.cnn.com/cnnnext/dam/assets/181025083025-prince-mohammed-bin-salman-small-11.jpg" "//cdn.cnn.com/cnnnext/dam/assets/181025171830-jamal-khashoggi-small-11.jpg" "//cdn.cnn.com/cnnnext/dam/assets/181025171830-jamal-khashoggi-small-11.jpg" ...
## $ duration : chr "" "" "" "" ...
## $ description: chr "The CIA has determined that Saudi Crown Prince Mohammed bin Salman personally ordered the killing of journalist"| __truncated__ "Multiple sources tell CNN that a much-anticipated United Nations Security Council resolution calling for a cess"| __truncated__ "The Trump administration on Thursday imposed penalties on 17 individuals over their alleged roles in the <a hre"| __truncated__ "Saudi prosecutors said Thursday they would seek the death penalty for five people allegedly involved in the mur"| __truncated__ ...
## $ layout : chr "" "" "" "" ...
## $ iconType : chr NA NA NA NA ...

List aviable WFS layers and read into data frame with rgdal

I have the following problem according to different sources it should be able to read WFS layer in R using rgdal.
dsn<-"WFS:http://geomap.reteunitaria.piemonte.it/ws/gsareprot/rp-01/areeprotwfs/wfs_gsareprot_1?service=WFS&request=getCapabilities"
ogrListLayers(dsn)
readOGR(dsn,"SIC")
The result of that code should be 1) to list the available WFS layer and 2) to read a specific Layer (SIC) into R as a Spatial(Points)DataFrame.
I tried several other WFS server but it does not work.
I always get the warning:
Cannot open data source
Checking for the WFS driver i get the following result:
> "WFS" %in% ogrDrivers()$name
[1] FALSE
Well it looks like the WFS driver is not implemented in rgdal (anymore?)
Or why are there so many examples "claiming" the opposite?
I also tried the gdalUtils package and well it works but It gives out the whole console message of ogrinfo.exe and not only the available layers.(I guess it "just" calls the ogrinfo.exe and sends the result back to R like using the r shell or system command).
Well does anyone know what I´m making wrong, or if something like that is even possible with rgdal or any similar package?
You can combine the two packages to accomplish your task.
First, convert the layer you need into a local shapefile using gdalUtils. Then, use rgdal as normal. NOTE: you'll see a warning message after the ogr2ogr call but it performed the conversion fine for me. Also, ogr2ogr won't overwrite local files without the overwrite parameter being TRUE (there are other parameters that may be of use as well).
library(gdalUtils)
library(rgdal)
dsn <- "WFS:http://geomap.reteunitaria.piemonte.it/ws/gsareprot/rp-01/areeprotwfs/wfs_gsareprot_1?service=WFS&request=getCapabilities"
ogrinfo(dsn, so=TRUE)
## [1] "Had to open data source read only."
## [2] "INFO: Open of `WFS:http://geomap.reteunitaria.piemonte.it/ws/gsareprot/rp-01/areeprotwfs/wfs_gsareprot_1?service=WFS&request=getCapabilities'"
## [3] " using driver `WFS' successful."
## [4] "1: AreeProtette"
## [5] "2: ZPS"
## [6] "3: SIC"
ogr2ogr(dsn, "sic.shp", "SIC")
sic <- readOGR("sic.shp", "sic", stringsAsFactors=FALSE)
## OGR data source with driver: ESRI Shapefile
## Source: "sic.shp", layer: "sic"
## with 128 features
## It has 23 fields
plot(sic)
str(sic#data)
## 'data.frame': 128 obs. of 23 variables:
## $ gml_id : chr "SIC.510" "SIC.472" "SIC.470" "SIC.508" ...
## $ objectid : chr "510" "472" "470" "508" ...
## $ inspire_id: chr NA NA NA NA ...
## $ codice : chr "IT1160026" "IT1160017" "IT1160018" "IT1160020" ...
## $ nome : chr "Faggete di Pamparato, Tana del Forno, Grotta delle Turbiglie e Grotte di Bossea" "Stazione di Linum narbonense" "Sorgenti del T.te Maira, Bosco di Saretto, Rocca Provenzale" "Bosco di Bagnasco" ...
## $ cod_tipo : chr "B" "B" "B" "B" ...
## $ tipo : chr "SIC" "SIC" "SIC" "SIC" ...
## $ cod_reg_bi: chr "1" "1" "1" "1" ...
## $ des_reg_bi: chr "Alpina" "Alpina" "Alpina" "Alpina" ...
## $ mese_istit: chr "11" "11" "11" "11" ...
## $ anno_istit: chr "1996" "1996" "1996" "1996" ...
## $ mese_ultmo: chr "2" NA NA NA ...
## $ anno_ultmo: chr "2002" NA NA NA ...
## $ sup_sito : chr "29396102.9972" "82819.1127" "7272687.002" "3797600.3563" ...
## $ perim_sito: chr "29261.8758" "1227.8846" "17650.289" "9081.4963" ...
## $ url1 : chr "http://gis.csi.it/parchi/schede/IT1160026.pdf" "http://gis.csi.it/parchi/schede/IT1160017.pdf" "http://gis.csi.it/parchi/schede/IT1160018.pdf" "http://gis.csi.it/parchi/schede/IT1160020.pdf" ...
## $ url2 : chr "http://gis.csi.it/parchi/carte/IT1160026.djvu" "http://gis.csi.it/parchi/carte/IT1160017.djvu" "http://gis.csi.it/parchi/carte/IT1160018.djvu" "http://gis.csi.it/parchi/carte/IT1160020.djvu" ...
## $ fk_ente : chr NA NA NA NA ...
## $ nome_ente : chr NA NA NA NA ...
## $ url3 : chr NA NA NA NA ...
## $ url4 : chr NA NA NA NA ...
## $ tipo_geome: chr "poligono" "poligono" "poligono" "poligono" ...
## $ schema : chr "Natura2000" "Natura2000" "Natura2000" "Natura2000" ...
Neither the questioner nor the answerer say how rgdal was installed. If it is a CRAN binary for Windows or OSX, it may well have a smaller set of drivers than an independent installation of GDAL underlying gdalUtils. Always state your platform, and whether rgdal was installed binary or from source, and always provide the output of the messages displayed as rgdal loads, as well as of sessionInfo() to show the platform on which you are running.
Given the possible difference in sets of drivers, the advice given seems reasonable.

Determining which R architectures are installed

How does one determine which architectures are supported by an installation of R? On a standard windows install, one may look for the existence of R_HOME/bin/*/R.exe where * is the architecture (typically i386 or x64). On a standard mac install from CRAN, there are no subdirectories.
I can query R for the default architecture using something like:
$ R --silent -e "sessionInfo()[[1]][[2]]"
> sessionInfo()[[1]][[2]]
[1] "x86_64"
but how do I know on mac/linux whether any sub-architectures are installed, and if so what they are?
R.version, R.Version(), R.version.string, and version provide detailed information about the version of R running.
Update, based on a better understanding of the question. This isn't a complete solution, but it seems you can get fairly close via a combination of the following commands:
# get all the installed architectures
arch <- basename(list.dirs(R.home('bin'), recursive=FALSE))
# handle different operating systems
if(.Platform$OS.type == "unix") {
arch <- gsub("exec","",arch)
if(arch == "")
arch <- R.version$arch
} else { # Windows
# any special handling
}
Note that this won't work if you've built R from source and installed the different architectures in various different places. See 2.6 Sub-architectures of the R Installation and Administration manual for more details.
Using Sys.info() you have a lot of information on your system.
May be it can help here
Sys.info()["machine"]
machine
"x86_64"
EDIT
One workaround to have all architecture possible is to download log files from the Rstudio mirror, it's not complete but it's good estimate of what you need.
start <- as.Date('2012-10-01')
today <- as.Date('2013-07-01')
all_days <- seq(start, today, by = 'day')
year <- as.POSIXlt(all_days)$year + 1900
urls <- paste0('http://cran-logs.rstudio.com/', year, '/', all_days, '.csv.gz')
files <- file.path("/tmp", basename(urls))
list_data <- lapply(files, read.csv, stringsAsFactors = FALSE)
data <- do.call(rbind, list_data)
str(data)
## 'data.frame': 10694506 obs. of 10 variables:
## $ date : chr "2012-10-01" "2012-10-01" "2012-10-01" "2012-10-01" ...
## $ time : chr "00:30:13" "00:30:15" "02:30:16" "02:30:16" ...
## $ size : int 35165 212967 167199 21164 11046 42294 435407 326143 119459 868695 ...
## $ r_version: chr "2.15.1" "2.15.1" "2.15.1" "2.15.1" ...
## $ r_arch : chr "i686" "i686" "x86_64" "x86_64" ...
## $ r_os : chr "linux-gnu" "linux-gnu" "linux-gnu" "linux-gnu" ...
## $ package : chr "quadprog" "lavaan" "formatR" "stringr" ...
## $ version : chr "1.5-4" "0.5-9" "0.6" "0.6.1" ...
## $ country : chr "AU" "AU" "US" "US" ...
## $ ip_id : int 1 1 2 2 2 2 2 1 1 3 ...
unique(data[["r_arch"]])
## [1] "i686" "x86_64" NA "i386" "i486"
## [6] "i586" "armv7l" "amd64" "000000" "powerpc64"
## [11] "armv6l" "sparc" "powerpc" "arm" "armv5tel"

How to convert searchTwitter results (from library(twitteR)) into a data.frame?

I am working on saving twitter search results into a database (SQL Server) and am getting an error when I pull the search results from twitteR.
If I execute:
library(twitteR)
puppy <- as.data.frame(searchTwitter("puppy", session=getCurlHandle(),num=100))
I get an error of:
Error in as.data.frame.default(x[[i]], optional = TRUE) :
cannot coerce class structure("status", package = "twitteR") into a data.frame
This is important because in order to use RODBC to add this to a table using sqlSave it needs to be a data.frame. At least that's the error message I got:
Error in sqlSave(localSQLServer, puppy, tablename = "puppy_staging", :
should be a data frame
So does anyone have any suggestions on how to coerce the list to a data.frame or how I can load the list through RODBC?
My final goal is to have a table that mirrors the structure of values returned by searchTwitter. Here is an example of what I am trying to retrieve and load:
library(twitteR)
puppy <- searchTwitter("puppy", session=getCurlHandle(),num=2)
str(puppy)
List of 2
$ :Formal class 'status' [package "twitteR"] with 10 slots
.. ..# text : chr "beautifull and kc reg Beagle Mix for rehomes: This little puppy is looking for a new loving family wh... http://bit.ly/9stN7V "| __truncated__
.. ..# favorited : logi FALSE
.. ..# replyToSN : chr(0)
.. ..# created : chr "Wed, 16 Jun 2010 19:04:03 +0000"
.. ..# truncated : logi FALSE
.. ..# replyToSID : num(0)
.. ..# id : num 1.63e+10
.. ..# replyToUID : num(0)
.. ..# statusSource: chr "<a href="http://twitterfeed.com" rel="nofollow">twitterfeed</a>"
.. ..# screenName : chr "puppy_ads"
$ :Formal class 'status' [package "twitteR"] with 10 slots
.. ..# text : chr "the cutest puppy followed me on my walk, my grandma won't let me keep it. taking it to the pound sadface"
.. ..# favorited : logi FALSE
.. ..# replyToSN : chr(0)
.. ..# created : chr "Wed, 16 Jun 2010 19:04:01 +0000"
.. ..# truncated : logi FALSE
.. ..# replyToSID : num(0)
.. ..# id : num 1.63e+10
.. ..# replyToUID : num(0)
.. ..# statusSource: chr "<a href="http://blackberry.com/twitter" rel="nofollow">Twitter for BlackBerry®</a>"
.. ..# screenName : chr "iamsweaters"
So I think the data.frame of puppy should have column names like:
- text
- favorited
- replytoSN
- created
- truncated
- replytoSID
- id
- replytoUID
- statusSource
- screenName
I use this code I found from http://blog.ouseful.info/2011/11/09/getting-started-with-twitter-analysis-in-r/ a while ago:
#get data
tws<-searchTwitter('#keyword',n=10)
#make data frame
df <- do.call("rbind", lapply(tws, as.data.frame))
#write to csv file (or your RODBC code)
write.csv(df,file="twitterList.csv")
I know this is an old question, but still, here is what I think is a ``modern'' version to solve this. Just use the function twListToDf
gvegayon <- getUser("gvegayon")
timeline <- userTimeline(gvegayon,n=400)
tl <- twListToDF(timeline)
Hope it helps
Try this:
ldply(searchTwitter("#rstats", n=100), text)
twitteR returns an S4 class, so you need to either use one of its helper functions, or deal directly with its slots. You can see the slots by using unclass(), for instance:
unclass(searchTwitter("#rstats", n=100)[[1]])
These slots can be accessed directly as I do above by using the related functions (from the twitteR help: ?statusSource):
text Returns the text of the status
favorited Returns the favorited information for the status
replyToSN Returns the replyToSN slot for this status
created Retrieves the creation time of this status
truncated Returns the truncated information for this status
replyToSID Returns the replyToSID slot for this status
id Returns the id of this status
replyToUID Returns the replyToUID slot for this status
statusSource Returns the status source for this status
As I mentioned, it's my understanding that you will have to specify each of these fields yourself in the output. Here's an example using two of the fields:
> head(ldply(searchTwitter("#rstats", n=100),
function(x) data.frame(text=text(x), favorited=favorited(x))))
text
1 #statalgo how does that actually work? does it share mem between #rstats and postgresql?
2 #jaredlander Have you looked at PL/R? You can call #rstats from PostgreSQL: http://www.joeconway.com/plr/.
3 #CMastication I was hoping for a cool way to keep data in a DB and run the normal #rstats off that. Maybe a translator from R to SQL code.
4 The distribution of online data usage: AT&T has recently announced it will no longer http://goo.gl/fb/eTywd #rstat
5 #jaredlander not that I know of. Closest is sqldf package which allows #rstats and sqlite to share mem so transferring from DB to df is fast
6 #CMastication Can #rstats run on data in a DB?Not loading it in2 a dataframe or running SQL cmds but treating the DB as if it wr a dataframe
favorited
1 FALSE
2 FALSE
3 FALSE
4 FALSE
5 FALSE
6 FALSE
You could turn this into a function if you intend on doing it frequently.
For those that run into the same problem I did which was getting an error saying
Error in as.double(y) : cannot coerce type 'S4' to vector of type 'double'
I simply changed the word text in
ldply(searchTwitter("#rstats", n=100), text)
to statusText, like so:
ldply(searchTwitter("#rstats", n=100), statusText)
Just a friendly heads-up :P
Here is a nice function to convert it into a DF.
TweetFrame<-function(searchTerm, maxTweets)
{
tweetList<-searchTwitter(searchTerm,n=maxTweets)
return(do.call("rbind",lapply(tweetList,as.data.frame)))
}
Use it as :
tweets <- TweetFrame(" ", n)
The twitteR package now includes a function twListToDF that will do this for you.
puppy_table <- twListToDF(puppy)

Resources