Converting twitteR results to data frame - r

I have a simple for loop to write the past 100 tweets of a few usernames to .csv files:
library(twitteR)
mclist <- read.table('usernames.txt')
for (mc in mclist)
{
tweets <- userTimeline(mc, n = 100)
df <- do.call("rbind", lapply(tweets, as.data.frame))
write.csv(df, file=paste("Desktop/", mc, ".csv", sep = ""), row.names = F)
}
I mostly followed what I've read on StackOverflow but I continue to get this error message:
Error in file(file, ifelse(append, "a", "w")) :
invalid 'description' argument
In addition: Warning message:
In if (file == "") file <- stdout() else if (is.character(file)) { :
the condition has length > 1 and only the first element will be used
Where did I go wrong?

I just cleaned up the code a bit, and everything started working.
Step 1: Let's set the working directory and load the 'twitteR' package.
library(twitteR)
setwd("C:/Users/Dinre/Desktop") # Replace with your desired directory
Step 2: First, we need to load a list of user names from a flat text file. I'm assuming that each line in the text file has one username, like so:
[contents of usernames.txt]
edclef
notch
dkanaga
Let's load it using the 'scan' function to read each line into an array:
mclist <- scan("usernames.txt", what="", sep="\n")
Step 3: We'll loop through the usernames, just like you did before, but we're not going to refer to the directory, since we're going to use the same directory for output as input. The original code had a syntax error in attempting to referring to the desktop directory, and we're just going to sidestep that.
for (mc in mclist){
tweets <- userTimeline(mc, n = 100)
df <- do.call("rbind", lapply(tweets, as.data.frame))
write.csv(df, file=paste(mc, ".csv", sep = ""), row.names = F)
}
I end up with three files on the desktop, and all the data seems to be correct.
edclef.csv
notch.csv
dkanaga.csv
Update: If you really want to refer to different directories within your code, use the '.' character to refer to the parent directory. For instance, if your working directory is your Windows user profile, you would refer to the 'Desktop' folder like so:
setwd("C:/Users/Dinre")
...
write.csv(df, file=paste("./Desktop/". mc, ".csv", sep = ""), row.names = F)

There's a convenience function in the package twListToDF which will handle the conversion of the list of tweets to a data.frame.

Since your mclist is a data.frame, you can replace your for by apply
apply( mclist, 1,function(mc){
tweets <- userTimeline(mc, n = 100)
df <- do.call("rbind", lapply(tweets, as.data.frame))
write.csv(df, file=paste("Desktop/", mc, ".csv", sep = ""), ##!! Change Desktop to
## something like Desktop/tweets/
row.names = F)
})
PS :
The userTimeline function will only work if the user requested has a
public timeline, or you have previously registered a OAuth object
using registerTwitterOAuth

Related

Error in file(file, "rt") : cannot open the connection - Unsure of what to do

I am currently working through Coursera's R Programming course and have hit a bit of a snag with this assignment. I have been getting various errors (not I'm not totally sure I've nailed down) but this is a new one and no matter what I do I can't seem to shake it.
Whenever I run the below code it comes back with
Error in file(file, "rt") : cannot open the connection
pollutantmean <- function (directory, pollutant, id){
files<- list.files(path = directory, "/", full.names = TRUE)
dat <- data.frame()
dat <- sapply(file = directory,"/", read.csv)
mean(dat["pollutant"], na.rm = TRUE)
}
I have tried numerous different solutions posted here on SO for this issue but none of it has worked. I made sure that I am running after setting the working directory to the folder with all of the CSV files and I can see all of the files in the file pane. I have also moved that working directory around a few times since some of the suggestions were to put it on the desktop, etc. but none of that has worked. I am currently running R Studio as an admin but that does not seem to have done anything and I have also modified the permissions on the specdata file to ensure there's no weird restrictions there. Any help is appreciated.
Here are two possible implementations:
# list all files in "directory", read them, combine and then take mean of "pollutant" column
pollutantmean_1 <- function (directory){
files <- list.files(path = directory, full.names = TRUE)
dat <- lapply(file = directory, read.csv)
dat <- data.table::rbindlist(dat) |> as.data.frame()
mean(dat[, 'pollutant' ], na.rm = TRUE)
}
# list all files in "directory", read them, take the mean of "pollutant" column for each file and return them
pollutantmean_2 <- function (directory){
files <- list.files(path = directory, full.names = TRUE)
dat <- lapply(file = directory, read.csv)
pollutant_means <- sapply(dat, function(x) mean(x[ , 'pollutant' ], na.rm = TRUE))
names(pollutant_means) <- basename(files)
pollutant_means
}

Extract column from multiple csv files and merge into new data frame in R

I want to extract column called X1 out of 168 different .csv files, called table3_2, table3_3, table3_4, table3_5..., table3_168, all held in one folder (folder1). Then, merge into one new df. Contents of the column is factor.
Trying this code but can't get it to work.
folder1 <- "folder1"
folder2 <- "folder2" # destination folder
write_to <- function(file.name) {
file.name <- paste0(tools::file_path_sans_ext(basename(file.name)), ".csv")
df <- read.csv(paste(folder1, file.name, sep = "/"), header = FALSE, sep = "/")[X1]
write.csv(df, file = past(folder2, file.name, sep= "/"))
}
files <- list.files(path = folder1, pattern = "*.csv")
lapply(X = paste(folder1, files, sep= "/"), write_to)
This comes up with the error:
Error in file(file, "rt") : cannot open the connection
In addition: warning message:
In file(file, "rt") :
cannot often file folder1/folder1.csv: No such file or directory
So, I am not calling in the correct names of the table, and maybe not directing R to the correct folder (I've set the wd to folder1).
Any suggestions would be greatly appreciated.
Many thanks
There are a few minor issues that stand out, e.g. you have a typo in file = past(folder2, file.name, sep= "/") (should be paste() not past()) but perhaps a simpler approach would suit, e.g. using vroom:
library(vroom)
files <- fs::dir_ls(glob = "table3_*csv")
data <- vroom(files, id = "ID", col_select = c(ID, X1))
data
vroom_write(data, file = "~/xx/folder2/new_df.csv")
# replace "xx" with your path
Does this approach solve your problem?

Error in loop reading multiple text files

I'm a bit stuck with this code... The purpose is to read only text files from a folder with few different kind of files, take a column for each one and create a data frame with every extracted column (cbind.fill is a hand-made function that add a new column and fill the "empty" spaces with NA values). Here is the code:
setwd("...folderOfInterest/")
genes_data <- data.frame()
for(i in list.files(pattern = "^GO_.*txt", full.names = TRUE)){
print(i) #this works perfectly, it only prints desired files...
q <- read.table(i, header = TRUE, sep = "\t", quote = NULL)
genes_data <- cbind.fill(genes_data, q[,2])
}
As #Adam B suggests, here is the print(i) output and a screenshot of the folder (folder_screenshot):
[1] "./GO_ALPHA_AMINO_ACID_CATABOLIC_PROCESS.xls"
[1] "./GO_ALPHA_AMINO_ACID_METABOLIC_PROCESS.xls"
[1] "./GO_ALPHA_BETA_T_CELL_ACTIVATION.xls"
[1] "./GO_AMINO_ACID_BETAINE_METABOLIC_PROCESS.xls"
[1] "./GO_AMINO_ACID_IMPORT.xls"
[1] "./GO_AMINO_ACID_TRANSMEMBRANE_TRANSPORT.xls"
[1] "./GO_AMINO_ACID_TRANSPORT.xls"
[1] "./GO_AMINOGLYCAN_BIOSYNTHETIC_PROCESS.xls"
[1] "./GO_ANGIOGENESIS.xls"
[1] "./GO_ANION_TRANSPORT.xls"
[1] "./GO_ANTIGEN_PROCESSING_AND_PRESENTATION.xls"
[1] "./GO_ANTIGEN_PROCESSING_AND_PRESENTATION_OF_ENDOGENOUS_ANTIGEN.xls"
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
cannot open file './GO_ANTIGEN_PROCESSING_AND_PRESENTATION_OF_ENDOGENOUS_ANTIGEN.xls': No such file or directory
(note: the files' extension is .xls, but really they are .txt files)
It propmts this message:
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
cannot open file './GO_ANTIGEN_PROCESSING_AND_PRESENTATION.txt': No such file or directory
Also running only q <- read.table(i, header = TRUE, sep = "\t", quote = NULL) appears this error message.
I think I'm in the correct folder (because print(i) works good), I've also changed full.names option and set list.files as a variable out the loop... but nothins seems to work. Please, if anybody has an idea it'll be welcome!
I've tried it on randomly generated files and it works. You probably do not need to cd into the directory with the data, just give the list.files a dir argument with the path to your data directory.
GOfls <- list.files("indata", pattern = "^GO_.*\\.txt", full.names = TRUE)
head(GOfls)
[1] "indata/GO_amswylfbgp.txt" "indata/GO_amswylfbgptxt" "indata/GO_apqqqktvir.txt"
[4] "indata/GO_arwudmbzsr.txt" "indata/GO_autljyljgn.txt" "indata/GO_beeqcmnayk.txt"
# lapply -> do.call for reading and binding the data is better approach
gene_data <- do.call('cbind', lapply(GOfls, function(path) read.delim(path)[,2]))
# have a look at the data
dim(gene_data)
[1] 100 100
I have tried to reproduce your problem this way (it's optional text):
dir.create("indata")
fls <- lapply(1:100, function(i) data.frame(matrix(rnorm(1000), ncol = 10)))
names(fls) <- replicate(100, paste0("./indata/", "GO_",
paste0(sample(letters, 10, replace = T),
collapse = ""), ".txt"
)
)
lapply(names(fls), function(x) write.table(fls[[x]], x, quote = F, sep = "\t"))
head(dir("indata"))
[1] "GO_acebruujkw.pdf" "GO_amswylfbgp.txt" "GO_amswylfbgptxt" "GO_apqqqktvir.txt"
[5] "GO_arwudmbzsr.txt" "GO_autljyljgn.txt"
# I have added some renamed .txt files (.pdf, .tiff, .gel) to the indata
rm(list = ls())
That's solved! It's a bit strange but copying the folder of interest into the desktop the code seems to work again.
A mate and I saw that hard disk's activity was collapsed, so we thought that maybe there could be a problem in the process of reading... so copying the folder was the (simple) solution!
Nevertheless, if anybody has an idea that explains this strange situation I'm sure it'll be useful! Thanks a lot!
EDIT
I've done some tests and maybe the problem is the name of the folder path, which it'd too long and crashes the loop.
I think it's because you're searching for .xls files, but then trying to open it at as a .txt file
In excel try saving the files as comma or tab delimited text files.
If you want to open excel files directly they have a few packages that can do that. Try readxl.

Looping through files using dynamic name variable in R

I have a large number of files to import which are all saved as zip files.
From reading other posts it seems I need to pass the zip file name and then the name of the file I want to open. Since I have a lot of them I thought I could loop through all the files and import them one by one.
Is there a way to pass the name dynamically or is there an easier way to do this?
Here is what I have so far:
Temp_Data <- NULL
Master_Data <- NULL
file.names <- c("f1.zip", "f2.zip", "f3.zip", "f4.zip", "f5.zip")
for (i in 1:length(file.names)) {
zipFile <- file.names[i]
dataFile <- sub(".zip", ".csv", zipFile)
Temp_Data <- read.table(unz(zipFile,
dataFile), sep = ",")
Master_Data <- rbind(Master_Data, Temp_Data)
}
I get the following error:
In open.connection(file, "rt") :
I can import them manually using:
dt <- read.table(unz("D:/f1.zip", "f1.csv"), sep = ",")
I can create the sting dynamically but it feels long winded - and doesn't work when I wrap it with read.table(unz(...)). It seems it can't find the file name and so throws an error
cat(paste(toString(shQuote(paste("D:/",zipFile, sep = ""))),",",
toString(shQuote(dataFile)), sep = ""), "\n")
But if I then print this to the console I get:
"D:/f1.zip","f1.csv"
I can then paste this into `read.table(unz(....)) and it works so I feel like I am close
I've tagged in data.table since this is what I almost always use so if it can be done with 'fread' that would be great.
Any help is appreciated
you can use the list.files command here:
first set your working directory, where all your files are stored there:
setwd("C:/Users/...")
then
file.names = list.files(pattern = "*.zip", recursive = F)
then your for loop will be:
for (i in 1:length(file.names)) {
#open the files
zipFile <- file.names[i]
dataFile <- sub(".zip", ".csv", zipFile)
Temp_Data <- read.table(unz(zipFile,
dataFile), sep = ",")
# your function for the opened file
Master_Data <- rbind(Master_Data, Temp_Data)
#write the file finaly
write_delim(x = Master_Data, path = paste(file.names[[i]]), delim = "\t",
col_names = T )}

How to bypass errors in for loops in r?

I created a for loop to merge several csv files in a directory together into one table. In some cases the files that are indicated in the loop have not been created. When the files do not exist the loop produces an error and no files are merged. I am trying to adjust the code so the loop inserts "NULL" or "error" in the parts of the matrix reserved for the files.
Here is the original code:
COMP_raw <- cbind(m, matrix(-9999, ncol = length(dirnames), nrow = 169))
setwd() #actual wd is removed for posting
for(i in length(dirnames)){
j<-dirnames[1] #Take the directory folder name
id<-gsub("_.*$","",dirnames[1]) #Take the numeric identifier of the indicator
fpath <- file.path(paste(j,"/",id,"_2016",".csv", sep = "")) #Merge the directory folder name and desired csv to a file path format
data<-read.csv(fpath,header = TRUE, as.is = TRUE)
last <- max(ncol(data))
COMP_raw[,(1+1)] <- data[,last]
colnames(COMP_raw)[(1+1)] <- names(data[last])
}
This above code works for every loop where the "fpath" actually exists in my directory. When the csv does not exist the following message occurs.
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
cannot open file '2.1_PermitIndirectCosts/2.1_2016.csv': No such file or directory
I looked at a few other posts to see how to solve the issue and tried the following
COMP_raw <- cbind(m, matrix(-9999, ncol = length(dirnames), nrow = 169))
for(i in length(dirnames)){
j<-dirnames[1] #Take the directory folder name
id<-gsub("_.*$","",dirnames[1]) #Take the numeric identifier of the indicator
fpath <- file.path(paste(j,"/",id,"_2016",".csv", sep = "")) #Merge the directory folder name and desired csv to a file path format
possibleerror<- tryCatch(data<-read.csv(fpath,header = TRUE, as.is = TRUE),silent = TRUE),
error=function(e) e
)
if(!inherits(possibleerror,"error"))
{last <- max(ncol(data))
COMP_raw[,(1+3)] <- data[,last]
colnames(COMP_raw)[(1+3)] <- names(data[last])}
}
But that is still generating an error
What about using file.exists().
file.exists returns a logical vector indicating whether the files named by its argument exist.
COMP_raw <- cbind(m, matrix(-9999, ncol = length(dirnames), nrow = 169))
setwd() #actual wd is removed for posting
for(i in length(dirnames)){
j <- dirnames[1] #Take the directory folder name
id <- gsub("_.*$","",dirnames[1]) #Take the numeric identifier of the indicator
fpath <- file.path(paste(j,"/",id,"_2016",".csv", sep = "")) #Merge the directory folder name and desired csv to a file path format
#Checks if file exists if not, assign NULL
if(file.exists(fpath)){
data <- read.csv(fpath,header = TRUE, as.is = TRUE)
last <- max(ncol(data))
COMP_raw[,(1+1)] <- data[,last]
colnames(COMP_raw)[(1+1)] <- names(data[last])
} else{
colnames(COMP_raw)[(1+1)] <- NULL
}
}
Not specific to your example (I'm on a mobile) but it should help:
var <- try(some function)
if(is(var, "try-error")){
some other function
next;}
If try fails, it will assign the variable a value of "try-error" which you can handle accordingly. next will go to the next item in the loop.

Resources