I currently run an online hockey league for NHL95. I have everything I need, except for the part of the standings where it displays the last 10 games. Like a team is 3-4-3-0 in their last 10 games played. I can't seem to get the formula to work.
Currently this is the structure in which I display the W,L,T,OTL,OTW (which can be counted as a W)
Ive tried countif, vlookup, sumif, sort. Just can't seem to get the right combinations. Mainly, got an error, or a figure that I didnt want displayed.
Related
I am using the word_associate package in R Markdown to create word clouds across a grouping variable with multiple categories. I would like the titles of each word cloud to be drawn from the character values of the grouping variable.
I have added trans_cloud(title=TRUE) to my code, but have not been able to resolve my problem. Here's my code, which runs but doesn't produce graphs with titles:
library(qdap)
word_associate(df$text, match.string=c("cat"), grouping.var=c(df$id),
text.unit="sentence", stopwords=c(Top200Words), wordcloud=TRUE, cloud.colors=c("#0000ff","#FF0000"),
trans_cloud(title=TRUE))
I have also tried the following, which does not run:
library(qdap)
word_associate(df$text, match.string=c("cat"), grouping.var=c(df$id),
text.unit="sentence", stopwords=c(Top200Words), wordcloud=TRUE, cloud.colors=c("#0000ff","#FF0000"),
title=TRUE)
Can anyone help me figure this out? I can't find any guidance in the documentation and there's hardly any examples of or discussions about word_associate on the web.
Here's an example data frame that reproduces the problem:
id text
question1 I love cats even though I'm allergic to them.
question1 I hate cats because I'm allergic to them.
question1 Cats are funny, cute, and sweet.
question1 Cats are mean and they scratch me.
question2 I only pet cats when I have my epipen nearby.
question2 I avoid petting cats at all cost.
question2 I visit a cat cafe every week. They have 100 cats.
question2 I tried to pet my friend's cat and it attacked me.
Note that if I run this in R (instead of Markdown), the figures automatically print the "question1_list1" and "question2_list1" in bright blue at the top of the figure file. This doesn't work for me because I need the titles to exclude "_list1" and be written in black. These automatically generated titles do not respond to changes in my trans_cloud specifications.
Ex:
library(qdap)
word_associate(df$text, match.string=c("cat"), grouping.var=c(df$id),
text.unit="sentence", stopwords=c(Top200Words), wordcloud=TRUE, cloud.colors=c("#0000ff","#FF0000"),
trans_cloud(title=TRUE, title.color="#000000"))
In addition, I'm locked in to using this package (as opposed to other options for creating word clouds in R) because I'm using it to create network plots, too.
I am scraping Amazon customer reviews using R and have come across a bug that I was hoping someone might have some insight into.
I have noticed that R fails to scrape the specified node (found by using SelectorGadget) from all reviews. Each time I run the script I retrieve a different amount, but never the entirety. This is very frustrating since the goal is to scrape the reviews and compile them into csv files that can later be manipulated using R. Essentially, if a product has 200 reviews, when I run the script, sometimes I will get 150 reviews, sometimes 75 reviews, etc- but not the entire 200. This issue seems to happen after I have done repeated scraping.
I have also gotten a few timeout errors, specifically "Error in open.connection(x, "rb") : Timeout was reached".
How do I get around this to continue scraping? I am a beginner but any help or insight is greatly appreciated!!
url <- "https://www.amazon.com/Match-Mens-Wild-Cargo-Pants/product-reviews/B009HLOZ9U/ref=cm_cr_arp_d_show_all?ie=UTF8&reviewerType=all_reviews&pageNumber="
N_pages <- 204
A <- NULL
for (j in 1: N_pages){
pant <- read_html(paste0(url, j))
B <- cbind(pant %>% html_nodes(".review-text") %>% html_text() )
A <- rbind(A,B)
}
tail(A)
print(j)
Is this not working for you?
Setting the URL as "https://www.amazon.com/Match-Mens-Wild-Cargo-Pants/product-reviews/B009HLOZ9U/ref=cm_cr_arp_d_paging_btm_2?ie=UTF8&reviewerType=avp_only_reviews&sortBy=recent&pageNumber="
N_pages <- 204
A <- NULL
for (j in 1: N_pages){
pant <- read_html(paste0(url, j))
B <- cbind(pant %>% html_nodes(".review-text") %>% html_text() )
A <- rbind(A,B)
}
tail(A)
[,1]
[1938,] "This is really a good item to get. Trendy, probably you can choose a different color, it fits good but I wouldn't say perfect."
[1939,] "I don't write reviews for most products, but I felt the need to do so for these pants for a couple reasons. First, they are great pants! Solid material, well-made, and they fit great. Second, I want to echo those who say you need to go up in size when you order. I wear anywhere from 32-34, depending on the brand. I ordered these in a 36 and they fit like a 33 or 34. I really love the look and feel of these, and will be ordering more!"
[1940,] "I bought the green one before, it is good quality and looks nice, than I purchased the similar one, but the khaki color, but received absolutely different product, different material. really disappointed."
[1941,] "These pants are great! I have been looking to update my wardrobe with a more edgy style; these cargo pants deliver on that. Paired with some casual sneakers or a decent nubuck leather boot completes the look from the waist down. The lazy-casual look is great when traveling, as are the many pockets. I wore these pants on a recent day trip to NYC and traveled comfortably with essential items contained in the 8 pockets. I placed a second order shortly after my first pair arrived because I like them so much. Shipping and delivery is also fairly fast, considering these pants ship from China!"
[1942,] "Pants are awesome, just like the picture. The size runs small, so if you order them I would order them bigger than normal. I usually wear a 34inch waist because i dont like my pants snug, these pants fit more like a 32 inch waist.Other than that i love them!"
[1943,] "the good:Pants are made from the durable cotton that has a nice feel; have a lot of useful features and roomy well placed pockets; durable stitching.the bad:Pants will shrink and drier/hot water is not recommended. Would have been better if the cotton was pretreated to prevent shrinking. I would gladly gave up the belt if I wouldn't have to wary about how to wash these pants.the ugly:faux pocket with a zipper. useless feature. on my pair came with a bright gold zipper, unlike a silver in a picture."
So, sometimes I need to get some data from the web organizing it into a dataframe and waste a lot of time doing it manually. I've been trying to figure out how to optimize this proccess, and I've tried with some R scraping approaches, but couldn't get to do it right and I thought there could be an easier way to do this, can anyone help me out with this?
Fictional exercise:
Here's a webpage with countries listed by continents: https://simple.wikipedia.org/wiki/List_of_countries_by_continents
Each country name is also a link that leads to another webpage (specific of each country, e.g. https://simple.wikipedia.org/wiki/Angola).
I would like as a final result to get a data frame with number of observations (rows) = number of countries listed and 4 variables (colums) as ID=Country Name, Continent=Continent it belongs to, Language=Official language (from the specific webpage of the Countries) and Population = most recent population count (from the specific webpage of the Countries).
Which steps should I follow in R in order to be able to reach to the final data frame?
This will probably get you most of the way. You'll want to play around with the different nodes and probably do some string manipulation (clean up) after you download what you need.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I am trying to automatically make a big corpus into a numeric list. One number per line. For example I have the following data:
Df.txt =
In the years thereafter, most of the Oil fields and platforms were named after pagan “gods”.
We love you Mr. Brown.
Chad has been awesome with the kids and holding down the fort while I work later than usual! The kids have been busy together playing Skylander on the XBox together, after Kyan cashed in his $$$ from his piggy bank. He wanted that game so bad and used his gift card from his birthday he has been saving and the money to get it (he never taps into that thing either, that is how we know he wanted it so bad). We made him count all of his money to make sure that he had enough! It was very cute to watch his reaction when he realized he did! He also does a very good job of letting Lola feel like she is playing too, by letting her switch out the characters! She loves it almost as much as him.
so anyways, i am going to share some home decor inspiration that i have been storing in my folder on the puter. i have all these amazing images stored away ready to come to life when we get our home.
With graduation season right around the corner, Nancy has whipped up a fun set to help you out with not only your graduation cards and gifts, but any occasion that brings on a change in one's life. I stamped the images in Memento Tuxedo Black and cut them out with circle Nestabilities. I embossed the kraft and red cardstock with TE's new Stars Impressions Plate, which is double sided and gives you 2 fantastic patterns. You can see how to use the Impressions Plates in this tutorial Taylor created. Just one pass through your die cut machine using the Embossing Pad Kit is all you need to do - super easy!
If you have an alternative argument, let's hear it! :)
First I read the text using the command readLines:
text <- readLines("Df.txt", encoding = "UTF-8")
Secondly I get all the text into lower letters and I remove unnecessary spacing:
## Lower cases input:
lower_text <- tolower(text)
## removing leading and trailing spaces:
Spaces_remove <- str_trim(lower_text)
From here on, I will like to assign each line a number e.g.:
"In the years thereafter, most of the Oil fields and platforms were named after pagan “gods”." = 1
"We love you Mr. Brown." = 2
...
"If you have an alternative argument, let's hear it! :)" = 6
Any ideas?
You already do kinda have numeric line # associations with the vector (it's indexed numerically), but…
text_input <- 'In the years thereafter, most of the Oil fields and platforms were named after pagan “gods”.
We love you Mr. Brown.
Chad has been awesome with the kids and holding down the fort while I work later than usual! The kids have been busy together playing Skylander on the XBox together, after Kyan cashed in his $$$ from his piggy bank. He wanted that game so bad and used his gift card from his birthday he has been saving and the money to get it (he never taps into that thing either, that is how we know he wanted it so bad). We made him count all of his money to make sure that he had enough! It was very cute to watch his reaction when he realized he did! He also does a very good job of letting Lola feel like she is playing too, by letting her switch out the characters! She loves it almost as much as him.
so anyways, i am going to share some home decor inspiration that i have been storing in my folder on the puter. i have all these amazing images stored away ready to come to life when we get our home.
With graduation season right around the corner, Nancy has whipped up a fun set to help you out with not only your graduation cards and gifts, but any occasion that brings on a change in one\'s life. I stamped the images in Memento Tuxedo Black and cut them out with circle Nestabilities. I embossed the kraft and red cardstock with TE\'s new Stars Impressions Plate, which is double sided and gives you 2 fantastic patterns. You can see how to use the Impressions Plates in this tutorial Taylor created. Just one pass through your die cut machine using the Embossing Pad Kit is all you need to do - super easy!
If you have an alternative argument, let\'s hear it! :)'
library(dplyr)
library(purrr)
library(stringi)
textConnection(text_input) %>%
readLines(encoding="UTF-8") %>%
stri_trans_tolower() %>%
stri_trim() -> corpus
# data frame with explicit line # column
df <- data_frame(line_number=1:length(corpus), text=corpus)
# list with an explicit line number field
lst <- map(1:length(corpus), ~list(line_number=., text=corpus[.]))
# implicit list numeric ids
as.list(corpus)
# explicit list numeric id's (but they're really string keys)
setNames(as.list(corpus), 1:length(corpus))
# named vector
set_names(corpus, 1:length(corpus))
There are a plethora of R packages that significantly ease the burden of text processing/NLP ops. Doing this work outside of them is likely to be reinventing the wheel. The CRAN NLP Task View lists many of them.
I have a paragraph:
disgusting do at was horrific we have stayed please to at traveler photos ironic i did post those witnessed each every thing in pictures gave us fist free then moved us to rooms were any better we slept with clothes on entire there never once took off shoes to walk on carpet shower etc holes in wall stains on bedding curtains couch chair no working electric in lamps cords nothing could be plugged in when we called down to fix it so we no lighting except bathroom light tv toilets constantly plugged up shower drain.
That appears to be a little grammatically weird since I cleaned the paragraph. And I use the following code to extract work frequencies.
# create corpus
docs<-Corpus(VectorSource(example))
# stem document
docs<-tm_map(docs,stemDocument)
# create document-term matrix
dtm<-DocumentTermMatrix(docs)
# convert row names
rownames(dtm)<-"example"
# collapse matrix by summing over columns
freq<-colSums(as.matrix(dtm))
# length should be total number of terms
length(freq)
# create sort order (descending)
ord<-order(freq,decreasing=TRUE)
# list all terms in decreasing order of freq and write to disk
freq[ord]
Then the freq[ord] is:
I am wondering why there is a word ani here, apparently, ani does not appear in my paragraph. Thanks.
Just figured the problem, the following code transfers any to ani, does anyone know how to avoid that?
docs<-tm_map(docs,stemDocument)
It's the word "any" after having being stemmed. The (in this case faulty) logic of the underlying function, wordStem, which uses Dr. Martin Porter's stemming algorithm and the C libstemmer library generated by Snowball, changed the y to an i.