R-markdown error compiling with missing value on certain paragraph - r

Hye everyone, I have problem with R Markdown,
I tried to compiled below R Code into pdf file but the problem is it has some issue with omitting NA values,
I use tinytex by the way.
R-version: 4.0.0
library(tidyr)
library(dplyr)
dl <- tempfile()
download.file("http://files.grouplens.org/datasets/movielens/ml-10m.zip", dl)
ratings <- read.table(text = gsub("::", "\t", readLines(unzip(dl, "ml-10M100K/ratings.dat"))),
col.names = c("userId", "movieId", "rating", "timestamp"))
movies <- str_split_fixed(readLines(unzip(dl, "ml-10M100K/movies.dat")), "\\::", 3)
colnames(movies) <- c("movieId", "title", "genres")
movies <- as.data.frame(movies) %>% mutate(movieId = as.numeric(levels(movieId))[movieId],
title = as.character(title),
genres = as.character(genres))
movielens <- left_join(ratings, movies, by = "movieId")
edx <- movielens[-test_index,]
edx <- edx %>% mutate(year = as.numeric(str_sub(title,-5,-2)))
split_edx <- edx %>% separate_rows(genres, sep = "\\|")
genres_popularity <- split_edx %>%
na.omit() %>% # omit missing values
select(movieId, year, genres) %>% # select columns we are interested in
mutate(genres = as.factor(genres)) %>% # turn genres in factors
group_by(year, genres) %>% # group data by year and genre
summarise(number = n()) %>% # count
complete(year = full_seq(year, 1), genres, fill = list(number = 0)) # add missing years/genres
I got this error:
Error in if (any(((x - rng[1])%%period > tol) & (period - (x - rng[1])%%period > :
missing value where TRUE/FALSE needed
Calls: ... dots_cols -> eval_tidy -> full_seq -> full_seq.numeric
Execution halted
This happen actually after I installed tinytex and miktex for r markdon latex, but before this it runs perfectly for execution.
Does anybody know why?

When I do rerun your code and I get to the
edx %>% separate_rows(genres, sep = "\\|")
My computer takes forever to progress the data, I will have to try later when I am home on my larger machine, I will try seeing I can help you

Related

How can I remove a specific symbol from for an entire column

I am wondering how can I delete a specific symbol for an entire column. Here is what the original data look like: original data.
The only element I want to get are the first words.
Here is what my full dataset look like:
Below are data background info
library("dplyr")
library("stringr")
library("tidyverse")
library("ggplot2")
# load the .csv into R studio, you can do this 1 of 2 ways
#read.csv("the name of the .csv you downloaded from kaggle")
spotiify_origional <- read.csv("charts.csv")
spotiify_origional <- read.csv("https://raw.githubusercontent.com/info201a-au2022/project-group-1-section-aa/main/data/charts.csv")
View(spotiify_origional)
# filters down the data
# removes the track id, explicit, and duration columns
spotify_modify <- spotiify_origional %>%
select(name, country, date, position, streams, artists, genres = artist_genres)
#returns all the data just from 2022
#this is the data set you should you on the project
spotify_2022 <- spotify_modify %>%
filter(date >= "2022-01-01") %>%
arrange(date) %>%
group_by(date)
spotify_2022_global <- spotify_modify %>%
filter(date >= "2022-01-01") %>%
filter(country == "global") %>%
arrange(date) %>%
group_by(streams)
View(spotify_2022_global)
This is what I did,
top_15 <- spotify_2022_global[order(spotify_2022_global$streams, decreasing = TRUE), ]
top_15 <- top_15[1:15,]
top_15$streams <- as.numeric(top_15$streams)
View(top_15)
top_15 <- top_15 %>%
separate(genres, c("genres"), sep = ',')
top_15$genres<-gsub("]","",as.character(top_15$genres))
View(top_15)
And now the name look like this:
name now look like this
I tried use the same gsub function to remove the rest of the brackets and quotation marks, but it didn't work.
I wonder what should I do at this point? Any recommendations will be hugely help! Thank you!
you could do this with a combination of sub to remove unwanted characters with string::word() which is a nice thing to extract a word.
w <- "[firstWord, secondWord, thirdWord]"
stringr::word(gsub('[\\[,\']', '', w),1)
#> [1] "firstWord"
This works also for w <- "['firstWord', 'secondWord', 'thirdWord']".
top_15$genres <- gsub("]|\\[|[']","",as.character(top_15$genres))
where the regex expression "]|\\[|[']" used the | character, OR, to match multiple things namely:
] closing square bracket
\\[ opening square bracket
['] single quotations
tidyversing up the "This is what I did" code, gives you:
spotify_2022_global %>%
arrange(desc(streams)) %>%
head(15) %>%
mutate(streams = as.numeric(streams),
genres = gsub("]|\\[|[']|,","",genres), # remove brackets and quote marks
genres = str_split(genres, ",")[[1]][1])) # get first word from list
gives:

Using a for loop to get bulk tweets from the Twitter API v2 endpoint in R

I am trying to collect more tweets than is allowed in a single query, hence I am using a for loop to automate this.
tweets <- data_frame()
for(i in 1:10){
httr::GET(url = url_tweet,
httr::add_headers(.headers = headers),
query = params) %>%
httr::content(response, as = "text") %>%
fromJSON(obj, flatten = TRUE) %>%
json_data <- view(enframe(unlist(json_data))) %>%
mutate(
id2 = name %>% str_extract("[0-9]+$"), # ensure unique rows
name = name %>% str_remove("[0-9]+$") %>% str_remove("^data.")
) %>%
pivot_wider(names_from = name, values_from = value) %>%
select(`tweet_id` = id, text, user_id=includes.users.id, user_name=includes.users.username, likes=public_metrics.like_count, retweets=public_metrics.retweet_count, quotes=public_metrics.quote_count) %>%
type_convert() -> data_sep
tweets <- rbind(tweets, data_sep)
}
I have run the code individually and there is nothing wrong with any of it, but when I try to loop it I get this error
Error in `select()`:
! Can't subset columns that don't exist.
x Column `id` doesn't exist.

R: Error in is_symbol(x) : object '.' not found (keras)

I am using the R programming language. I am trying to follow the R tutorial over here on neural networks (lstm) and time series: https://blogs.rstudio.com/ai/posts/2018-06-25-sunspots-lstm/
I decided to create my own time series data ("y.mon") for this tutorial (the same format and the same variable names) :
library(tidyverse)
library(glue)
library(forcats)
library(timetk)
library(tidyquant)
library(tibbletime)
library(cowplot)
library(recipes)
library(rsample)
library(yardstick)
library(keras)
library(tfruns)
library(dplyr)
library(lubridate)
library(tibbletime)
library(timetk)
index = seq(as.Date("1749/1/1"), as.Date("2016/1/1"),by="day")
index <- format(as.Date(index), "%Y/%m/%d")
value <- rnorm(97520,27,2.1)
final_data <- data.frame(index, value)
y.mon<-aggregate(value~format(as.Date(index),
format="%Y/%m"),data=final_data, FUN=sum)
y.mon$index = y.mon$`format(as.Date(index), format = "%Y/%m")`
y.mon$`format(as.Date(index), format = "%Y/%m")` = NULL
y.mon %>%
mutate(index = paste0(index, '/01')) %>%
tk_tbl() %>%
mutate(index = as_date(index)) %>%
as_tbl_time(index = index) -> y.mon
From here on, I follow the instructions in the tutorial (replacing the "sun_spots data" with "y.mon". Everything works fine until this point (I posted a question yesterday that got closed for being too detailed https://stackoverflow.com/questions/65527230/r-error-in-is-symbolx-object-not-found-keras - the code can be followed from the rstudio tutorial) :
#ERROR
coln <- colnames(compare_train)[4:ncol(compare_train)]
cols <- map(coln, quo(sym(.)))
rsme_train <-
map_dbl(cols, function(col)
rmse(
compare_train,
truth = value,
estimate = !!col,
na.rm = TRUE
)) %>% mean()
rsme_train
Error in is_symbol(x) : object '.' not found
I found another stackoverflow post which deals with a similar problem:Getting error message while calculating rmse in a time series analysis
According to this stackoverflow post, this first error can be resolved like this:
coln <- colnames(compare_train)[4:ncol(compare_train)]
rsme_train <-
map_df(coln, function(col)
rmse(
compare_train,
truth = value,
estimate = !!col,
na.rm = TRUE
)) %>%
pull(.estimate) %>%
mean()
rsme_train
However, the following section of the tutorial has a similar section in which the same error persists even after applying the corrections:
compare_test %>% write_csv(str_replace(model_path, ".hdf5", ".test.csv"))
compare_test[FLAGS$n_timesteps:(FLAGS$n_timesteps + 10), c(2, 4:8)] %>% print()
cols <- map(coln, quo(sym(.)))
rsme_test <-
map_dbl(cols, function(col)
rmse(
compare_test,
truth = value,
estimate = !!col,
na.rm = TRUE
)) %>% mean()
rsme_test
#errors:
Error in stri_replace_first_regex(string, pattern, fix_replacement(replacement), :
object 'model_path' not found
Error in is_symbol(x) : object '.' not found
These errors are preventing me from finishing the rest of the tutorial.
Can someone please show me how to fix these?
Thanks
Try using coln in map_dbl :
rsme_test <- map_dbl(coln, function(col)
rmse(
compare_test,
truth = value,
estimate = !!col,
na.rm = TRUE
)) %>% mean()

Error when trying to load dl format using igraph

I am trying to load the kapferer min dataset into r using the igraph function "read_graph"
The code is very simple, however it throws an error.
test_g <-read_graph("http://vlado.fmf.uni-lj.si/pub/networks/data/ucinet/kapmine.dat", format = "dl")
Error in read.graph.dl(file, ...) : At foreign.c:3050 : syntax
error, unexpected $end, expecting DL in line 1, Parse error
The as can be seen by following the link the file does begin with DL. The only clue I can find to this is a message from 2015 which basically says file a bug report.
Can dl files not beloaded by igraph at the moment, or is there some trick to it?
As there doesn't seem to be a clear way to load dl files I have made a loader that seems to work well for dl graph on the Pajek website. The function is a bit scrappy and has not been extensively tested, but it may be useful to some who want to use certain graphs that are not available in a more common format. If there is more to date information on these datasets, then this code can be ignored.
load_dl_graph <- function(file_path, directed){
raw_mat <- readLines(file_path) %>%
enframe()
row_labels_row <- grep( "ROW LABELS:", raw_mat$value)
column_labels_row <- grep( "COLUMN LABELS:", raw_mat$value)
level_labels_row <- grep("LEVEL LABELS:",raw_mat$value )
data_table_row <- grep( "DATA:", raw_mat$value)
row_labels <- raw_mat %>%
slice((row_labels_row+1):(column_labels_row-1)) %>%
select(from = value)
column_labels <- raw_mat %>%
slice((column_labels_row+1):(level_labels_row-1)) %>% pull(value)
table_levels <- raw_mat %>%
slice((level_labels_row+1):(data_table_row+-1)) %>% pull(value)
data_df <- raw_mat %>%
slice((data_table_row+1):nrow(.)) %>%
select(value) %>%
mutate(value = str_squish(value)) %>%
separate(col = value, into = column_labels, sep = " ") %>%
mutate(table_id = rep(1:length(table_levels), each = nrow(.)/length(table_levels)))
tables_list <- 1:length(table_levels) %>%
map(~{
data_df %>%
filter(table_id ==.x) %>%
select(-table_id) %>%
bind_cols(row_labels,.) %>%
pivot_longer(cols = 2:ncol(.), names_to = "to", values_to = "values") %>%
filter(values ==1) %>%
select(-values) %>%
graph_from_data_frame(., directed = directed)
})
names(tables_list) <- table_levels
return(tables_list)
}

Map a tbl of hyperlinks into read_html

I have a tibble containing one column which stores hyperlinks in each column. Now I want to map over these links using map_dfr, passing the links one after another through read_html(.x[.x]) %>%
html_node(".body-copy-lg") %>% html_text. If I do so I always end up with the error :
Error in doc_parse_file(con, encoding = encoding, as_html = as_html, options = options) :
Expecting a single string value: [type=character; extent=3].
Which tells me that the read_html basically says: " Hey stop throwing more than one string at the same time on me."
So did I make a mistake in the mapper? Is this a bug? I really can't see why the mapper-function does not grab each element one after another.
What I tried so far :
target_regex <- "(xtm)|((k|K)(i|I|1|11)(d|D)(n|N).)|(Ar<e)\\s(you)\\s(in)|
(LOAN)|(AR(\\s|\\S)[0-9])|((B|b)(i|1|l)tc.)|(Coupon)|(Plastic.King)|(organs)|(SILI)|(Electric.Cigarette.Machine)"
adverts <- function(df) df[!grepl(target_regex, df$...1,perl = T), ]
bribe <- read_html(paste("http://ipaidabribe.com/reports/paid?page", 10, sep = "="))
report <- map(".read-more", ~html_nodes(bribe, .x) %>%
html_attr(.x[[1]][[1]][[1]], name = "href"))[[1]] %>%
as_tibble(.name_repair = "unique") %>%
bind_rows() %>%
rename( ...1 = value) %>%
adverts() %>%
map_dfr(~read_html(.x[.x]) %>%
html_node(".body-copy-lg") %>%
html_text)
Do not mind the call of rename() which is basically something what needed to be done to make the adverts usable in this case.
You're forgetting that most functions in R are vectorized, and that using map or apply functions is unnecessary. In your case, it is needed in the final step of getting the html text.
The syntax your are using in map is also puzzling, and I think you should review ?map to get a better handle on it. For instance, you use multiple .x or extracted values where you should just be using .x to refer to the sub-element of the object you are iterating over.
library(tidyverse)
library(rvest)
target_regex <- "(xtm)|((k|K)(i|I|1|11)(d|D)(n|N).)|(Ar<e)\\s(you)\\s(in)|
(LOAN)|(AR(\\s|\\S)[0-9])|((B|b)(i|1|l)tc.)|(Coupon)|(Plastic.King)|(organs)|(SILI)|(Electric.Cigarette.Machine)"
adverts <- function(df) df[!grepl(target_regex, df$...1,perl = T), ]
bribe <- read_html(paste("http://ipaidabribe.com/reports/paid?page", 10, sep = "="))
report <- html_nodes(bribe, ".read-more") %>%
html_attr("href") %>%
as_tibble(.name_repair = "unique") %>%
filter(str_detect(value, target_regex, negate = TRUE)) %>%
mutate(text = map_chr(value, ~read_html(.x) %>%
html_node(".body-copy-lg") %>%
html_text))
result
# A tibble: 3 x 2
value text
<chr> <chr>
1 http://ipaidabribe.com/reports/paid/paid-bribe-to-settle-matter… "\r\n Place: Nelamangala Police Station, Bangalore\nDate of incident: 5th Jan 2020, 3PM…
2 http://ipaidabribe.com/reports/paid/paid-500-rs-bribe-at-nizamu… "\r\n My Brother Mahesh Prasad travelling on PNR number 4822171124 train no 12721 Ni…
3 http://ipaidabribe.com/reports/paid/drone-air-follow-focus-wire… "\r\n This new Silencer Air+ is a tremendously versatile and resourceful follow focus, z…

Resources