Extract tweets with specific character from a list of account - r

I want to get the timeline (10 tweets, for example) of a list of profiles, but I want to get only the tweets which contain a specific character or string.
profiles<- c("a", "b", "c", "d")
keyword <- "apple"
tweets <- get_timeline(
user = profiles,
q = #I DON'T KNOW WHAT TO PUT HERE TO GET TWEETS THAT CONTAIN keyword: can't use grepl()
#because the vector should be tweets... maybe with an if statement but I can't find the syntax
n = 10,
since_id = NULL,
max_id = NULL,
home = FALSE,
parse = TRUE,
check = TRUE,
retryonratelimit = NULL,
verbose = TRUE,
token = NULL)

You can use: q = "a OR b OR c OR d OR apple", I recommend reading the official documentation of the API about which logical operators are available and how to use them. rtweet doesn't use the twitter API v2 yet (only for the streaming endpoints at the 1.1.0 release)

Related

Unwanteed period added to "in" when making a JSON from data frame (R jsonlite)

When generating a JSON object from a data frame in R, "in" is converted to "in." and I'm not sure how to solve this. Any help would be appreciated!
library(jsonlite)
query_json3 <- data.frame("eClass" = c("Fermentation","Sample", "ResultValue","Experiment"),
"collection" = c("fermentations","samples", "resultValues", "experiments"))
filters1 <- data.frame("field" = "attributes.experiment",
"value" = "EXP-EB-22-019")
filters2 <- data.frame("field" = "originID",
"in" = "fermentations.originID")
filters3 <- data.frame("field" = "SubjectID",
"in" = "samples.id")
filters4 <- data.frame("field" = "type",
"value" = "Small-Scale Screening")
query_json3[1, "filters"][[1]] <- list(filters1)
query_json3[2, "filters"][[1]] <- list(filters2)
query_json3[3, "filters"][[1]] <- list(filters3)
query_json3[4, "filters"][[1]] <- list(filters4)
toJSON(query_json3)
Output:
[{"eClass":"Fermentation","collection":"fermentations","filters":[{"field":"attributes.experiment","value":"EXP-EB-22-019"}]},{"eClass":"Sample","collection":"samples","filters":[{"field":"originID","in.":"fermentations.originID"}]},{"eClass":"ResultValue","collection":"resultValues","filters":[{"field":"SubjectID","in.":"samples.id"}]},{"eClass":"Experiment","collection":"experiments","filters":[{"field":"type","value":"Small-Scale Screening"}]}]
Desired output:
[{"eClass":"Fermentation","collection":"fermentations","filters":[{"field":"attributes.experiment","value":"EXP-EB-22-019"}]},{"eClass":"Sample","collection":"samples","filters":[{"field":"originID","in":"fermentations.originID"}]},{"eClass":"ResultValue","collection":"resultValues","filters":[{"field":"SubjectID","in":"samples.id"}]},{"eClass":"Experiment","collection":"experiments","filters":[{"field":"type","value":"Small-Scale Screening"}]}]
The smoking gun is not jsonlite here. in is a reserved keyword in R, and for this reason, data.frame adds the period. You can use check.names=FALSE:
data.frame("field" = "SubjectID",
"in" = "samples.id")
# field in.
# 1 SubjectID samples.id
data.frame("field" = "SubjectID",
"in" = "samples.id",
check.names = FALSE)
# field in
# 1 SubjectID samples.id
The problem is not actually to do with jsonlite. If you look at your dataframe filters3 then you can see that the column name has changed already there so jsonlite is doing exactly what it should. There are reserved words in R (see for example here: https://learnetutorials.com/r-programming/identifiers-constants-reserved-words) and "in" is a reserved word. Can you change the name of this field without breaking things elsewhere? If the name is causing problems here then it is likely to cause problems elsewhere too.

I am trying correct some spelling error in variable names in R

I have two (2) datasets, one containing the list of stores and ID and the other containing details about sales.
There are errors in the spelling of some store names in the latter.
I wrote an R code to aid in solving this issue, but I am stuck.
Any help is appreciated.
To my understanding, I need to create a dictionary of correct spellings.
Please find below some of my attempts.
# fix missing values in store names
# define regular expression pattern
pat <- "^.*PARVIFLORA ?OM?A.*$"
gsub(pat, "Parviflora Łomża", sum_jan[, 1])
# using find and replace method
Replaces <- data.frame(from = c("PARVIFLORA ?OM?A", "PARVIFLORA ?WIEBODZIN"), to = c(" Parviflora
Łomża", "Świebodzin"))
ABNewDF <- replace(data = sum_jan, Var = "sum_jan[, 1]", replaceData = Replaces,
from = "from", to = "to", exact = FALSE)
ABNewVector <- FindReplace(data = ABData, Var = "a", replaceData = Replaces,
from = "from", to = "to", vector = TRUE)
# using string
reported <- sum_jan
corrections <- rbind( c("PARVIFLORA ?WIEBODZIN", "Świebodzin"))
colnames(corrections) <- c("wrong", "right")
corrections
typos <- aregexec("PARVIFLORA ?WIEBODZIN", reported)
regmatches(reported, typos)
# dictionary of corrected misspellings

How can I insert non-repeated records into BigQuery using the bigrquery library?

I'm trying to insert a non-repeated record into BigQuery but keep receiving the error Array specified for non-repeated field: record..
My question is: How can I insert non-repeated records into BigQuery using the bigrquery library?
If I have the following schema:
bqSchema <- bq_fields(list(
bq_field(name = "record", type = "RECORD", fields = list(
bq_field(name = "a", type = "INTEGER"),
bq_field(name = "b", type = "STRING")
))
))
And this data frame:
df <- tibble(
record = list(
a = 1,
b = "B"
)
)
Inserting the data as below causes the error in BigQuery:
bq_perform_upload(bqTableObj, df, fields = bqSchema)
# Array specified for non-repeated field: record
I think this is in part because bigrquery converts the dataframe to JSON with jsonlite::stream_out(), but doesn't use the argument auto_unbox = TRUE, resulting in arrays, rather than objects. This results in the following newline delimited JSON being sent to BigQuery:
{"record": [1]}
{"record": ["B"]}
The correct NDJSON that should be sent to BigQuery I believe should be:
{"record": {"a": 1, "b", "B"}}
Has anyone had this problem before, or have ideas how I can resolve this?
You should try the following where you set mode = "REPEATED" :
bqSchema <- bq_fields(list(
bq_field(name = "record", type = "RECORD", mode = "REPEATED",
fields = list(bq_field(name = "a", type = "INTEGER"),
bq_field(name = "b", type = "STRING")
)
)
))

Collect search occurrences with rscopus in r?

I have to make a lot of queries on Scopus. For this reason I need to automatize the process.
I have loaded "rscopus" Package and I wrote this code:
test <- generic_elsevier_api(query = "industry",
type = c("abstract"),
search_type = c("scopus"),
api_key = myLabel,
headers = NULL,
content_type = c("content"),
root_http = "http:/api.elsevier.com",
http_end = NULL,
verbose = TRUE,
api_key_error = TRUE)
My goal is obtaining the number of occurrences of a particular query.
In this example, if I search for "industry", I want to obtain the number of search results of the query.
query occurrence
industry 1789
how could I do?

R: Loop to capture weather data from multiple station

I wanted to perform loop to capture weather data from multiple stations using code below:
library(rwunderground)
sample_df <- data.frame(airportid = c("K6A2",
"KAPA",
"KASD",
"KATL",
"KBKF",
"KBKF",
"KCCO",
"KDEN",
"KFFC",
"KFRG"),
stringsAsFactors = FALSE)
history_range(set_location(airport_code =sample_df$airportid), date_start = "20170815", date_end = "20170822",
limit = 10, no_api = FALSE, use_metric = FALSE, key = get_api_key(),
raw = FALSE, message = TRUE)
It won't work.
Currently, you are passing the entire vector (multiple character values) into the history_range call. Simply lapply to iteratively pass the vector values and even return a list of history_range() return objects. Below uses a defined function to pass the parameter. Extend the function as needed to perform other operations.
capture_weather_data <- function(airport_id) {
data <- history_range(set_location(airport_code=airport_id),
date_start = "20170815", date_end = "20170822",
limit = 10, no_api = FALSE, use_metric = FALSE, key = get_api_key(),
raw = FALSE, message = TRUE)
write.csv(data, paste0("/path/to/output/", airport_id, ".csv"))
return(data)
}
data_list <- lapply(sample_df$airportid, capture_weather_data)
Also, name each item in list to the corresponding airport_id character value:
data_list <- setNames(data_list, sample_df$airportid)
data_list$K6A2 # 1st ITEM
data_list$KAPA # 2nd ITEM
data_list$KASD # 3rd ITEM
...
In fact, with sapply (the wrapper to lapply) you can generate list and name each item in same call but the input vector must be a character type (not factor):
data_list <- sapply(as.character(sample_df$airportid), capture_weather_data,
simplify=FALSE, USE.NAMES=TRUE)
names(data_list)
I think this history_range function that you brought up, from the rwunderground package as I understand, requires a weather underground API key. I went to the site and even signed up for it, but the email validation process in order to get a key (https://www.wunderground.com/weather/api) doesn't seem to be working correctly at the moment.
Instead I went to the CRAN mirror (https://github.com/cran/rwunderground/blob/master/R/history.R) and from what I understand, the function accepts only one string as set_location argument. The example provided in the documentation is
history(set_location(airport_code = "SEA"), "20130101")
So what you should be doing as a "loop", instead, is
sample_df <- as.vector(sample_df)
for(i in 1:length(sample_df)){
history_range(
set_location(airport_code = sample_df[[i]]),
date_start = "20170815", date_end = "20170822",
limit = 10, no_api = FALSE, use_metric = FALSE,
key = get_api_key(),
raw = FALSE, message = TRUE)
}
If this doesn't work, let me know. (Ack, somebody also gave another answer to this question while I was typing this up.)

Resources