Exporting R dataframes in excel spreadsheet [closed] - r

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I want to work using the fbRanks package in R, that's a package built to rank football/soccer teams and predict the number of goals during a match.
After I read the [pdf manual][1] describing that package provided by the author, I still don't understand how I can build the dataset in order to start analysing data.
In order to understand how the author built the dataset, I want to export the dataframe he used to do the analysis in an excel spreadsheet.
I already nstalled the xlsx package that should allow to do this, but, when I run
write.xlsx(dataframe_name, "path_in_windows")
R gives an error:
Error in is.data.frame(x) : object 'B00scores' not found
even if the dataframe exists.
Can somebody help me?
I'm not an expert user of R; if the question is not clear, please comment below.

The error suggest that not only have your installed the package but that you jave also loaded it into the current session. You should have offered code that showed how you created the object with the name 'dataframe_name', but we can see from the error that it had the value "B00scores". Again you should have shown how B00scores was created since is does not exist after:
> data('B00data', package='fbRanks')
> str(B00scores)
Error in str(B00scores) : object 'B00scores' not found
As it turns out, you are misspelling the name:
> str(B00.scores)
'data.frame': 2031 obs. of 9 variables:
$ date : Date, format: "2012-05-25" "2012-05-25" ...
$ home.team : chr "PacNW Blue B00" "WA Rush Swoosh B00" "Crossfire Premier B B00" "Normandy Park FC White B00" ...
$ home.score: num 0 1 12 4 1 10 0 4 7 5 ...
$ away.team : chr "Normandy Park FC White B00" "PacNW Maroon B00" "WA Rush Swoosh B00" "WA Rush Nike B00" ...
$ away.score: num 8 7 0 0 8 1 1 0 1 1 ...
$ venue : chr "PacNW Memorial Cup 2012" "PacNW Memorial Cup 2012" "PacNW Memorial Cup 2012" "PacNW Memorial Cup 2012" ...
$ home.adv : chr "neutral" "neutral" "neutral" "neutral" ...
$ away.adv : chr "neutral" "neutral" "neutral" "neutral" ...
$ surface : chr "Grass" "Grass" "Turf" "Grass" ...
So unlike the two voters to close for lack of clarity, I think the close vote should be for misspelling.

Related

Error in match.arg(opt_crit) : 'arg' must be NULL or a character vector

Error in match.arg(opt_crit) : 'arg' must be NULL or a character vector
occurs when trying to run my script in r.
I have tried to find the solution for it, but it seems to be pretty specific, and little help for me.
My dataset contains 3936 obs of 7 variables.
environment, skill, volume, datetime, year, month, day
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 3696 obs. of 7 variables:
$ environment: chr "b2b" "b2b" "b2b" "b2b" ...
$ skill : chr "BO Bedrift" "BO Bedrift" "BO Bedrift" "BO Bedrift" ...
$ year : num 2017 2017 2017 2017 2017 ...
$ month : num 1 1 1 1 1 2 2 2 2 3 ...
$ day : num 2 9 16 23 30 6 13 20 27 6 ...
$ volume : num 360 312 305 222 113 ...
$ datetime : Date, format: "2017-01-02" "2017-01-09" "2017-01-16" "2017-01-23" ...
but when trying to run
volume_ets <- volume_tsbl %>% ETS(volume)
this message shows in the console
Error in match.arg(opt_crit) : 'arg' must be NULL or a character vector
I tried somewhat of a shortcut, but nothing helped,
volume_tsbl$volume <- as.numeric(as.character(volume_tsbl$volume))
Tried to run
volume_ets <- volume_tsbl %>% ETS(volume)
this message shows in the console
Error in match.arg(opt_crit) : 'arg' must be NULL or a character vector
I tried somewhat of a shortcut, but nothing helped,
volume_tsbl$volume <- as.numeric(as.character(volume_tsbl$volume))
volume_ets <- volume_tsbl %>% ETS(volume)
my tsibble looks like this;
volume_tsbl <- volume %>¤ as_tsibble(key = c(skill, environment), index = c(datetime), regular = TRUE )
Expected the code to run, but it does not.
This is the result of an interface change made in late 2018. The change was to make model functions (such as ETS()) create model definitions, rather than fitted models. Essentially, ETS() no longer accepts data as an input, and the specification for the ETS model would become ETS(volume).
The equivalent code in the current version of fable is:
volume_ets <- volume_tsbl %>% model(ETS(volume))
Where the model() function is used to train one or more model definitions (ETS(volume) in this case) to a given dataset.
You can refer to the pkgdown site for fable to see more details: http://fable.tidyverts.org/
In particular, the ETS() function is documented here: http://fable.tidyverts.org/reference/ETS.html

Using read_html in R to get Russell 3000 holdings?

I was wondering if there is a way to automatically pull the Russell 3000 holdings from the iShares website in R using the read_html (or rvest) function?
url: https://www.ishares.com/us/products/239714/ishares-russell-3000-etf
(all holdings in the table on the bottom, not just top 10)
So far I have had to copy and paste into an Excel document, save as a CSV, and use read_csv to create a tibble in R of the ticker, company name, and sector.
I have used read_html to pull the SP500 holdings from wikipedia, but can't seem to figure out the path I need to put in to have R automatically pull from iShares website (and there arent other reputable websites I've found with all ~3000 holdings). Here is the code used for SP500:
read_html("https://en.wikipedia.org/wiki/List_of_S%26P_500_companies")%>%
html_node("table.wikitable")%>%
html_table()%>%
select('Symbol','Security','GICS Sector','GICS Sub Industry')%>%
as_tibble()
First post, sorry if it is hard to follow...
Any help would be much appreciated
Michael
IMPORTANT
According to the Terms & Conditions listed on BlackRock's website (here):
Use any robot, spider, intelligent agent, other automatic device, or manual process to search, monitor or copy this Website or the reports, data, information, content, software, products services, or other materials on, generated by or obtained from this Website, whether through links or otherwise (collectively, "Materials"), without BlackRock's permission, provided that generally available third-party web browsers may be used without such permission;
I suggest you ensure you are abiding by those terms before using their data in a way that violates those rules. For educational purposes, here is how data would be obtained:
First you need to get to the actual data (not the interactive javascript). How familiar are you with the devloper function on your browser? If you navigate through the webiste and track the traffic, you will notice a large AJAX:
https://www.ishares.com/us/products/239714/ishares-russell-3000-etf/1467271812596.ajax?tab=all&fileType=json
This is the data you need (all). After locating this, it is just cleaning the data. Example:
library(jsonlite)
#Locate the raw data by searching the Network traffic:
url="https://www.ishares.com/us/products/239714/ishares-russell-3000-etf/1467271812596.ajax?tab=all&fileType=json"
#pull the data in via fromJSON
x<-jsonlite::fromJSON(url,flatten=TRUE)
>Large list (10.4 Mb)
#use a comination of `lapply` and `rapply` to unlist, structuring the results as one large list
y<-lapply(rapply(x, enquote, how="unlist"), eval)
>Large list (50677 elements, 6.9Mb)
y1<-y[1:15]
> str(y1)
List of 15
$ aaData1 : chr "MSFT"
$ aaData2 : chr "MICROSOFT CORP"
$ aaData3 : chr "Equity"
$ aaData.display: chr "2.95"
$ aaData.raw : num 2.95
$ aaData.display: chr "109.41"
$ aaData.raw : num 109
$ aaData.display: chr "2,615,449.00"
$ aaData.raw : int 2615449
$ aaData.display: chr "$286,156,275.09"
$ aaData.raw : num 2.86e+08
$ aaData.display: chr "286,156,275.09"
$ aaData.raw : num 2.86e+08
$ aaData14 : chr "Information Technology"
$ aaData15 : chr "2588173"
**Updated: In case you are unable to clean the data, here you are:
testdf<- data.frame(matrix(unlist(y), nrow=50677, byrow=T),stringsAsFactors=FALSE)
#Where we want to break the DF at (every nth row)
breaks <- 17
#number of rows in full DF
nbr.row <- nrow(testdf)
repeats<- rep(1:ceiling(nbr.row/breaks),each=breaks)[1:nbr.row]
#split DF from clean-up
newDF <- split(testdf,repeats)
Result:
> str(head(newDF))
List of 6
$ 1:'data.frame': 17 obs. of 1 variable:
..$ matrix.unlist.y...nrow...50677..byrow...T.: chr [1:17] "MSFT" "MICROSOFT CORP" "Equity" "2.95" ...
$ 2:'data.frame': 17 obs. of 1 variable:
..$ matrix.unlist.y...nrow...50677..byrow...T.: chr [1:17] "AAPL" "APPLE INC" "Equity" "2.89" ...
$ 3:'data.frame': 17 obs. of 1 variable:
..$ matrix.unlist.y...nrow...50677..byrow...T.: chr [1:17] "AMZN" "AMAZON COM INC" "Equity" "2.34" ...
$ 4:'data.frame': 17 obs. of 1 variable:
..$ matrix.unlist.y...nrow...50677..byrow...T.: chr [1:17] "BRKB" "BERKSHIRE HATHAWAY INC CLASS B" "Equity" "1.42" ...
$ 5:'data.frame': 17 obs. of 1 variable:
..$ matrix.unlist.y...nrow...50677..byrow...T.: chr [1:17] "FB" "FACEBOOK CLASS A INC" "Equity" "1.35" ...
$ 6:'data.frame': 17 obs. of 1 variable:
..$ matrix.unlist.y...nrow...50677..byrow...T.: chr [1:17] "JNJ" "JOHNSON & JOHNSON" "Equity" "1.29" ...

Getting {xml_nodeset (0)} when using html_nodes from rvest package in R

I am trying to scrape headlines off a few news websites using the html_node function and the SelectorGadget but find that some do not work giving the result "{xml_nodeset (0)}". For example the below code gives such result:
url_cnn = 'https://edition.cnn.com/'
webpage_cnn = read_html(url_cnn)
headlines_html_cnn = html_nodes(webpage_cnn,'.cd__headline-text')
headlines_html_cnn
The ".cd__headline-text" I got using the SelectorGadget.
Other websites work such as:
url_cnbc = 'https://www.cnbc.com/world/?region=world'
webpage_cnbc = read_html(url_cnbc)
headlines_html_cnbc = html_nodes(webpage_cnbc,'.headline')
headlines_html_cnbc
Gives a full set of headlines. Any ideas why some websites return the "{xml_nodeset (0)}" result?
Please, please, please stop using Selector Gadget. I know Hadley swears by it but he's 100% wrong. What you see with Selector Gadget is what's been created in the DOM after javascript has been executed and other resources have been loaded asynchronously. Please use "View Source". That's what you get when you use read_html().
Having said that, I'm impressed CNN is as generous as they are (you def can scrape this page) and the content is most certainly on that page, just not rendered (which is likely even better):
Now, that's javascript, not JSON so we'll need some help from the V8 package:
library(rvest)
library(V8)
ctx <- v8()
# get the page source
pg <- read_html("https://edition.cnn.com/")
# find the node with the data in a <script> tag
html_node(pg, xpath=".//script[contains(., 'var CNN = CNN || {};CNN.isWebview')]") %>%
html_text() %>% # get the plaintext
ctx$eval() # sent it to V8 to execute it
cnn <- ctx$get("CNN") # get the data ^^ just created
After exploring the cnn object:
str(cnn[["contentModel"]][["siblings"]][["articleList"]], 1)
## 'data.frame': 55 obs. of 7 variables:
## $ uri : chr "/2018/11/16/politics/cia-assessment-khashoggi-assassination-saudi-arabia/index.html" "/2018/11/16/politics/hunt-crown-prince-saudi-un-resolution/index.html" "/2018/11/15/politics/us-khashoggi-sanctions/index.html" "/2018/11/15/middleeast/jamal-khashoggi-saudi-prosecutor-death-penalty-intl/index.html" ...
## $ headline : chr "<strong>CIA determines Saudi Crown Prince personally ordered journalist's death, senior US official says</strong>" "Saudi crown prince's 'fit' over UN resolution" "US issues sanctions on 17 Saudis over Khashoggi murder" "Saudi prosecutor seeks death penalty for Khashoggi killers" ...
## $ thumbnail : chr "//cdn.cnn.com/cnnnext/dam/assets/181025083025-prince-mohammed-bin-salman-small-11.jpg" "//cdn.cnn.com/cnnnext/dam/assets/181025083025-prince-mohammed-bin-salman-small-11.jpg" "//cdn.cnn.com/cnnnext/dam/assets/181025171830-jamal-khashoggi-small-11.jpg" "//cdn.cnn.com/cnnnext/dam/assets/181025171830-jamal-khashoggi-small-11.jpg" ...
## $ duration : chr "" "" "" "" ...
## $ description: chr "The CIA has determined that Saudi Crown Prince Mohammed bin Salman personally ordered the killing of journalist"| __truncated__ "Multiple sources tell CNN that a much-anticipated United Nations Security Council resolution calling for a cess"| __truncated__ "The Trump administration on Thursday imposed penalties on 17 individuals over their alleged roles in the <a hre"| __truncated__ "Saudi prosecutors said Thursday they would seek the death penalty for five people allegedly involved in the mur"| __truncated__ ...
## $ layout : chr "" "" "" "" ...
## $ iconType : chr NA NA NA NA ...

How to properly read in text data

I am just getting started with text analysis in r. By reading in some example text data I get the following result.
sms_raw <- read.csv("sms_spam.csv", stringsAsFactors = FALSE)
> str(sms_raw)
'data.frame': 5559 obs. of 2 variables:
$ type : chr "ham" "ham" "ham" "spam,\"complimentary 4 STAR Ibiza
Holiday or £10,000 cash needs your URGENT collection. 09066364349 NOW from
Landline not to l"| __truncated__ ...
$ text.........: chr "Hope you are having a good week. Just checking
in;;;;;;;;;" "K..give back my thanks.;;;;;;;;;" "Am also doing in cbe only.
But have to pay.;;;;;;;;;" "" ...
It seems to me as if the variables are not getting seperated properly. Further analyzing the data with the head function I get the following result:
head(sms_raw)
type
1
ham
2
ham
3
ham
4 spam,"complimentary 4 STAR Ibiza Holiday or £10,000 cash needs your
URGENT collection. 09066364349 NOW from Landline not to lose out!
Box434SK38WP150PPM18+";;;;;;;;;
5
spam
6
ham
text.........
1
Hope you are having a good week. Just checking in;;;;;;;;;
2
K..give back my thanks.;;;;;;;;;
3
Am also doing in cbe only. But have to pay.;;;;;;;;;
Does anybody have suggestions how to resolve this?
Try data.table::fread("sms_spam.csv", stringsAsFactors = FALSE,sep=";")
EDIT
you can try:
input_file<-readLines("/path/of/sms_spam.csv")

Data importing Delimiter issue in R

I am trying to import a text file into R, and put it into a data frame, along with other data.
My delimiter is "|" and a sample of my data is here :
|Painless check-in. Two legs of 3 on AC: AC105, YYZ-YVR. Roomy and clean A321 with fantastic crew. AC33: YVR-SYD,
very light load and had 3 seats to myself. A very enthusiastic and friendly crew as usual on this transpacific
route that I take several times a year. Arrived 20 min ahead of schedule. The expected high level of service from
our flag carrier, Air Canada. Altitude Elite member.
|We recently returned from Dublin to Toronto, then on to Winnipeg. Other than cutting it close due to limited
staffing in Toronto our flight was excellent. Due to the rush in Toronto one of our carry ones was placed to go in
the cargo hold. When we arrived in Winnipeg it stayed in Toronto, they were most helpful and kind at the Winnipeg
airport, and we received 3 phone calls the following day in regards to the misplaced bag and it was delivered to
our home. We are very thankful and more than appreciative of the service we received what a great end to a
wonderful holiday.
|Flew Toronto to Heathrow. Much worse flight than on the way out. We paid a hefty extra fee for exit seats which
had no storage whatsoever, and not even any room under the seats. Ridiculous. Crew were poor, not friendly. One
older male member of staff was quite attitudinal, acting as though he was doing everyone a huge favour by serving
them. A reasonable dinner but breakfast was a measly piece of banana loaf. That's it! The worst airline breakfast
I have had.
As you can see, there are many "|" , but as this screenshot below shows, when I imported the data in R, it only separated it once, instead of about 152 times.
How do I get each individual piece of text in a different column inside the data frame? I would like a data frame of length 152, not 2.
EDIT: The code lines are:
myData <- read.table("C:/Users/Norbert/Desktop/research/Important files/Airline Reviews/Reviews/air_can_Review.txt", sep="|",quote=NULL, comment='',fill = TRUE, header=FALSE)
length(myData)
[1] 2
class(myData)
[1] "data.frame"
str(myData)
'data.frame': 1244 obs. of 2 variables:
$ V1: Factor w/ 1093 levels "","'delayed' on departure (I reference flights between March 2014 and January 2015 in this regard: Denver, SFO,",..: 210 367 698 853 1 344 483 87 757 52 ...
$ V2: Factor w/ 154 levels ""," hotel","5/9/2014, LHR to Vancouver, AC855. 23/9/2014, Vancouver to LHR, AC854. For Economy the leg room was OK compared to",..: 1 1 1 1 78 1 1 1 1 1 ...
myDataFrame <- data.frame(text = myData, otherVar2 = 1, otherVar2 = "blue", stringsAsFactors = FALSE)
str(myDataFrame)
'data.frame': 531 obs. of 3 variables:
$ text : chr "BRU-YUL, May 26th, A330-300. Departed on-time, landed 30 minutes late due to strong winds, nice flight, food" "excellent, cabin-crew smiling and attentive except for one old lady throwing meal trays like boomerangs. Seat-" "pitch was very generous, comfortable seat, IFE a bit outdated but selection was Okay. Air Canadas problem is\nthat the new pro"| __truncated__ "" ...
$ otherVar2 : num 1 1 1 1 1 1 1 1 1 1 ...
$ otherVar2.1: chr "blue" "blue" "blue" "blue" ...
length(myDataFrame)
[1] 3
A better way to read in the text is using scan(), and then put it into a data frame with your other variables (here I just made some up). Note that I took your text above, and pasted it into a file called sample.txt, after removing the starting "|".
myData <- scan("sample.txt", what = "character", sep = "|")
myDataFrame <- data.frame(text = myData, otherVar2 = 1, otherVar2 = "blue",
stringsAsFactors = FALSE)
str(myDataFrame)
## 'data.frame': 3 obs. of 3 variables:
## $ text : chr "Painless check-in. Two legs of 3 on AC: AC105, YYZ-YVR. Roomy and clean A321 with fantastic crew. AC33: YVR-SYD, very light loa"| __truncated__ "We recently returned from Dublin to Toronto, then on to Winnipeg. Other than cutting it close due to limited staffing in Toront"| __truncated__ "Flew Toronto to Heathrow. Much worse flight than on the way out. We paid a hefty extra fee for exit seats which had no storage "| __truncated__
## $ otherVar2 : num 1 1 1
## $ otherVar2.1: Factor w/ 1 level "blue": 1 1 1
The otherVar1, otherVar2 are just placeholders for your own variables, as you said you wanted a data.frame with other variables. I chose an integer variable and a text variable, and by specifying a single value, it gets recycled for all observations in the dataset (in the example, 3).
I realize that your question asks how to get each text in a different column, but that is not a good way to use a data.frame, since data.frames are designed to hold variables in columns. (With one text per column, you cannot add other variables.)
If you really want to do that, you have to coerce the data after transposing it, as follows:
myDataFrame <- as.data.frame(t(data.frame(text = myData, stringsAsFactors = FALSE)), stringsAsFactors = FALSE)
str(myDataFrame)
## 'data.frame': 1 obs. of 3 variables:
## $ V1: chr "Painless check-in. Two legs of 3 on AC: AC105, YYZ-YVR. Roomy and clean A321 with fantastic crew. AC33: YVR-SYD, very light loa"| __truncated__
## $ V2: chr "We recently returned from Dublin to Toronto, then on to Winnipeg. Other than cutting it close due to limited staffing in Toront"| __truncated__
## $ V3: chr "Flew Toronto to Heathrow. Much worse flight than on the way out. We paid a hefty extra fee for exit seats which had no storage "| __truncated__
length(myDataFrame)
## [1] 3
"Measly banana loaf"? Definitely economy class.

Resources