I'm trying to count the number of words in a text but the count function is throwing an error message. I will be grateful for any help. Thanks - counting

library(tidytext)
library (dplyr)
anfarm %>%
unnest_tokens(output = "word",
input = "text_column",
token = "words") %>%
count(word, sort = TRUE)
#> Error in UseMethod("count") :
#> no applicable method for 'count' applied to an object of class "c('tbl_df', 'tbl', 'data.frame')"
I expanded the tidytext function by adding the count function. I was expecting the tokenization of words, and the the number of occurences for each word.

Related

no applicable method for 'lead' applied to an object of class "c('double', 'numeric')"

I'm attempting to write a code that runs multiple regression (OLS, FE, RE) models on some financial firm data, but am encountering an error when defining new variables. Using the below code I'm trying to define new lag/lead variables using variables I have already defined in previous code:
## Lead/Lag variables
library(dplyr)
df <- df[with(df,order(gvkey,fyear)), ]
df <- df %>%
group_by(gvkey) %>%
mutate(lagfyear = lag(fyear,1),
lagat = lag(at,1),
lagcash = lag(cash,1),
lagtang = lag(Q,1),
lagzscore = lag(zscore,1),
lagQ = lag(Q,1),
leadQ = lead(Q,1),
leadroa = lead(roa, 1),
leadz = lead(zscore,1),
leadrd = lead(rd_lagat,1),
leadlogmktcap = lead(tang,1),
leadtang = lead(tang, 1),
leadcf = lead(cf, 1),
leadlogsale = lead(logsale,1)) %>%
as.data.frame()
I'm getting the following error:
Error: Problem with mutate() column leadQ.
i leadQ = lead(Q, 1).
x no applicable method for 'lead' applied to an object of class "c('double', 'numeric')"
i The error occurred in group 1: gvkey = 1004.
gvkey is the unique key assigned to each firm. Q is another variable that's been defined as market value/total replacement value of the firm.
This code works for my colleagues, so I'm not sure why it's not working for me. Any help is appreciated.
I think the answer is written down here.
In short, I think you are using the wrong lag function. Try using dplyr::lag instead.

error in R: Error in paste("CO", period) : cannot coerce type 'closure' to vector of type 'character

I'm getting an error in my R program that says: Error in paste0("CO", period) :
cannot coerce type 'closure' to vector of type 'character'
Here's my code, I can't figure out where it's coming from
COdata <- COdata %>%
filter(!is.na(ID)) %>%
mutate(key = paste("CO", period)) %>%
select(-latitude, -longitude, -period, -str.month,
-str.day, -str.year, -end.month, -end.day, -end.year, -distance2,
-stations2, -days2) %>%
spread(key = key, value = CO.concentration)
What's odd is that when I run the first 4 commands, the COdata is perfectly fine and the paste function creates the values that I want. However, when I run all 5 commands, then the error pops up saying that it's with the paste function.
We would need a sample from COdata to answer for sure, but it seems that at the time when you run this line, COdata does not include a period column.
Therefore, R searches for an external object to paste, and the first it encounters is a function, probably lubridate::period(), hence the surprising error message.
To avoid this kind of "wrong" message, you could use the dplyr pronoun .data, which explicitly asks to search the variable inside the data.
Here is an example:
library(dplyr, warn.conflicts=FALSE)
library(lubridate, warn.conflicts=FALSE)
iris %>% mutate(key=paste("a", period))
#> Error: Problem with `mutate()` input `key`.
#> x cannot coerce type 'closure' to vector of type 'character'
#> i Input `key` is `paste("a", period)`.
iris %>% mutate(key=paste("a", .data$period))
#> Error: Problem with `mutate()` input `key`.
#> x Column `period` not found in `.data`
#> i Input `key` is `paste("a", .data$period)`.
#Created on 2020-08-21 by the reprex package (v0.3.0)
With the .data pronoun, the error message is explicit, there is no column period not found in .data.

Pivot Wider in R with NRC dict "no loop for break/next, jumping to top level"

I am new to R and currently using it for data mining on yelp reviews.
I am currently trying to pivot_wider on the NRC dictionary but keep getting the following error:
"Values in `idf` are not uniquely identified; output will contain list-cols.
* Use `values_fn = list(idf = list)` to suppress this warning.
* Use `values_fn = list(idf = length)` to identify where the duplicates arise
* Use `values_fn = list(idf = summary_fun)` to summarise duplicates
no loop for break/next, jumping to top level"
this is the code I am trying to run to pivot_wider:
revDTM_sentiNRC <- rrSenti_nrc %>%
pivot_wider(id_cols = c(review_id,stars),
names_from = word,
values_from = idf) %>%
ungroup()
I have done the same with the AFINN and Bing dictionaries with success.
I have tried adding R's suggestions from the error in the code but this does not work. I have also tried using this code for the duplicates
rrSenti_nrc <- group_by(review_id) %>% distinct(words, .keep_all = TRUE)
this however gives me the following error:
"Error in group_by(review_id) : object 'review_id' not found"
Which I do not understand because the rest of my code is able to find 'review_id' but perhaps this is the wrong approach to tackling the issue.
Thank you in advance for any suggestions/help.

I need help tidying data for topic modelling

I am fairly new to R but I am having a problem with part of text pre-processing and cleaning before topic modelling. I am trying to Tokenise text to turn each document into a list of words- punctuation is removed as part of this process - column is called text
tokens <- text_input %>% unnest_tokens(words, text)
but I keep getting the error message
Error in UseMethod("unnest_tokens_") :
no applicable method for 'unnest_tokens_' applied to an object of class "c('tbl_spark', 'tbl_sql', 'tbl_lazy', 'tbl')"
My text data is currently
text <chr> "mr smiths tenant called for support "
...
I need each document to be turned into a list of words so spell checking etc can be completed and then topic modelling
Code already tried
Basic dataframe called input and then text_input
Database: spark_connection
$ lines <chr> " mr smiths tenant called for support "
# set the name of the column with your source text
text_col <- "lines"
## Basic cleaning
text_input <- input %>%
filter(!is.na(!!as.name(text_col))) %>%
mutate(text = trimws(!!as.name(text_col)))%>%
mutate(text = tolower(text))
## Tokenise Text
## Turns each document into a list of words; punctuation is removed as part of this process
tokens <- text_input %>% unnest_tokens(words, text)
Error in UseMethod("unnest_tokens_") :
no applicable method for 'unnest_tokens_' applied to an object of class "c('tbl_spark', 'tbl_sql', 'tbl_lazy', 'tbl')"

Object 'negative" not found error in R - During Sentiment Analysis

I am trying to do Sentiment Analysis in R by extracting the Tweets. Tweets are getting extracted fine. But getting this error when I try to Tokenize the tweets. Error I get is
"Error in mutate_impl(.data, dots) :
Evaluation error: object 'negative' not found.".
This is the code I have written
tokens %>%
inner_join(get_sentiments("bing")) %>% # pull out only sentiment words
count(sentiment) %>% # count the # of positive & negative words
spread(sentiment, n, fill = 0) %>%
mutate(sentiment = positive - negative)
This error is getting thrown once I include the Mutate statement.
Appreciate any help I can get in resolving this please.

Resources