I am trying to loop through all names in a csv file for the following loop to retrieve twitter data:
require(twitteR)
require(data.table)
consumer_key <- 'KEY'
consumer_secret <- 'CON_SECRET'
access_token <- 'TOKEN'
access_secret <- 'ACC_SECRET'
setup_twitter_oauth(consumer_key,consumer_secret,access_token,access_secret)
options(httr_oauth_cache=T)
accounts <- read.csv(file="FILE.CSV", header=FALSE, sep="")
Sample data in CSV file (each name in one only one row, first column):
timberghmans
alyssabereznak
JoshuaLenon
names <- lookupUsers(c(accounts))
for(name in names){
a <- getUser(name)
print(a)
b <- a$getFollowers()
print(b)
b_df <- rbindlist(lapply(b, as.data.frame))
print(b_df)
c <- subset(b_df, location!="")
d <- c$location
print(d)
}
However, it does not work. Every new row contains a twitter screenname.When I type it in like this:
names <- lookupUsers(c("USER1","USER2","USER3"))
it works perfectly. I also tried to loop through the accounts, but to no avail. Does someone maybe have an general example, or could anyone give a hint please?
Related
I want to extract lat/long data + file name from csv
I have done the following:
#libraries-----
library(readr)
library("dplyr")
library("tidyverse")
# set wd-----EXAMPLE
setwd("F:/mydata/myfiles/allcsv")
# have R read files as list -----
list <- list.files("F:/mydata/myfiles/allcsv", pattern=NULL, all.files=FALSE,
full.names=FALSE)
list
]
#lapply function
row.names<- c("Date=0", "Time=3", "Type=2", "Model=1", "Coordinates=nextrow", "Latitude = 38.3356", "Longitude = 51.3323")
AllData <- lapply(list, read.table,
skip=5, header=FALSE, sep=";", row.names=row.names, col.names=NULL)
PulledRows <-
lapply(AllData, function(DF)
DF[fileone$Latitude==38.3356, fileone$Longitude==51.3323]
)
# maybe i need to specify a for loop?
how my data looks
Thank you.
This should work for you. You may have to change the path location if the .csv files are not in your working directory. And the location to save the final results.
results <- data.frame(Latitude=NA,Longitude=NA,FileName=NA) #create empty dataframe
for(i in 1:length(list)){ # loop through each file obtained from list (called above)
dat <- read_csv(list[i],col_names = FALSE) # read in the ith dataset
df <- data.frame(dat[6,1],dat[7,1],list[i]) # create new dataframe with values from dat
df[,1] <- as.numeric(str_remove(df[,1],'Latitude=')) # remove text and make numeric
df[,2] <- as.numeric(str_remove(df[,2],'Longitude='))
names(df) <- names(results) # having the same column names allows next line
results <- rbind(results,df) # 'stacks' the results dataframe and df dataframe
}
results <- na.omit(results) # remove missing values (first row)
write_csv(results,'desired/path')
I have a for loop that loops through a list of urls,
url_list <- c('http://www.irs.gov/pub/irs-soi/04in21id.xls',
'http://www.irs.gov/pub/irs-soi/05in21id.xls',
'http://www.irs.gov/pub/irs-soi/06in21id.xls',
'http://www.irs.gov/pub/irs-soi/07in21id.xls',
'http://www.irs.gov/pub/irs-soi/08in21id.xls',
'http://www.irs.gov/pub/irs-soi/09in21id.xls',
'http://www.irs.gov/pub/irs-soi/10in21id.xls',
'http://www.irs.gov/pub/irs-soi/11in21id.xls',
'http://www.irs.gov/pub/irs-soi/12in21id.xls',
'http://www.irs.gov/pub/irs-soi/13in21id.xls',
'http://www.irs.gov/pub/irs-soi/14in21id.xls',
'http://www.irs.gov/pub/irs-soi/15in21id.xls')
dowloads an excel file from each one assigns it to a dataframe and performs a set of data cleaning operations on it.
library(gdata)
for (url in url_list){
test <- read.xls(url)
cols <- c(1,4:5,97:98)
test <- test[-(1:8),cols]
test <- test[1:22,]
test <- test[-4,]
test$Income <-test$Table.2.1...Returns.with.Itemized.Deductions..Sources.of.Income..Adjustments..Itemized.Deductions.by.Type..Exemptions..and.Tax..Items..by.Size.of.Adjusted.Gross.Income..Tax.Year.2015..Filing.Year.2016.
test$Total_returns <- test$X.2
test$return_dollars <- test$X.3
test$charitable_deductions <- test$X.95
test$charitable_deduction_dollars <- test$X.96
test[1:5] <- NULL
}
My problem is that the loop simply writes over the same dataframe for each iteration through the loop. How can I have it assign each iteration through the loop to a data frame with a different name?
Use assign. This question is a duplicate of this post: Change variable name in for loop using R
For your particular case, you can do something like the following:
for (i in 1:length(url_list)){
url = url_list[i]
test <- read.xls(url)
cols <- c(1,4:5,97:98)
test <- test[-(1:8),cols]
test <- test[1:22,]
test <- test[-4,]
test$Income <-test$Table.2.1...Returns.with.Itemized.Deductions..Sources.of.Income..Adjustments..Itemized.Deductions.by.Type..Exemptions..and.Tax..Items..by.Size.of.Adjusted.Gross.Income..Tax.Year.2015..Filing.Year.2016.
test$Total_returns <- test$X.2
test$return_dollars <- test$X.3
test$charitable_deductions <- test$X.95
test$charitable_deduction_dollars <- test$X.96
test[1:5] <- NULL
assign(paste("test", i, sep=""), test)
}
You could write to a list:
result_list <- list()
for (i_url in 1:length(url_list)){
url <- url_list[i_url]
...
result_list[[i_url]] <- test
}
You can also name the list
names(result_list) <- c("df1","df2","df3",...)
Here's another approach with lapply instead of for loops which will write all resulting data.frames as separate list items which can then be re-named (if needed).
url_list <- c('http://www.irs.gov/pub/irs-soi/04in21id.xls',
...
'http://www.irs.gov/pub/irs-soi/15in21id.xls')
readURLFunc <- function(z){
test <- readxl::read_xls(z)
...
test[1:5] <- NULL
return(test)}
data_list <- lapply(url_list, readURLFunc)
Suppose I have a csv file which have 2 columns username, tweet. For each user, how can get all the tweets he made into a list. For example the list should be something like list(c(user1,tweet1,t2,t3),c(u2,t7,t8,t9),....)
usernameslist <- alldata$V5
usernameslist <- usernameslist[-1]
tweetslist <- alldata$V2
tweetslist <- tweetslist[-1]
user_and_his_tweets <- split(tweetslist,usernameslist, drop = FALSE )
mylist <- list()
for(i in 1:length(user_and_his_tweets)){
mylist <- list(mylist,c(names(user_and_his_tweets[i]),as.character(user_and_his_tweets[[i]])))
}
This is what I tried. But "mylist" is not in the format I wanted.
I know how to delete rows in in a sequence for a SINGLE list:
data <- data.table('A' = c(1,2,3,4), 'B' = c(900,6,'NA',2))
row.remove <- data[!(data$A = seq(from=1,to=4,by=2) )]
However, I would like to know how to do so with MULTIPLE lists.
Code I've tried:
file.number <- c(1:5)
data <- setNames(lapply(paste(file.number,".csv"), read.csv, paste(file.number)) # this line imports the lists from csv files - works
data.2 <- lapply(data, data.table) # seems to work
row.remove <- lapply(data.2, function(x) x[!(data.2$A = seq(from=1,to=4,by=2)) # no error message, but deletes all the rows
I feel like I'm missing something obvious, any help will be greatly appreciated.
Solution:
for (i in 1:5){
file.number = i
data <- setNames(lapply(paste(file.number,".csv"), read.csv, paste(file.number))
data <- as.data.table(data)
row.remove <- data[!(data$A = seq(from=1,to=4,by=2) )]
}
Instead of analyzing the list simultaneously, this will analyze the lists one by one. It's not a full solution, but more of a work around.
I am trying to read csv files with their names as dates into a for loop and then print out a few columns of data from that file when it is actually there. I need to skip over the dates that I don't have any data for and the dates that don't actually exist. When I put in my code there is no output, it is just blank. Why doesn't my code work?
options(width=10000)
options(warn=2)
for(a in 3:5){
for(b in 0:1){
for(c in 0:9){
for(d in 0:3){
for(e in 0:9){
mydata=try(read.csv(paste("201",a,b,c,d,e,".csv",sep="")), silent=TRUE)
if(class(mydata)=="try-error"){next}
else{
mydata$Data <- do.call(paste, c(mydata[c("LAST_UPDATE_DT","px_last")], sep=""))
print(t(mydata[,c('X','Data')]))
}
}}}}}
That's a really terrible way to read in all your files. Try this:
f <- list.files(pattern="*.csv")
mydata <- sapply(f, read.csv, simplify=FALSE)
This will return a list mydata of data frames, each of which is the contents of the corresponding file.
Or, if there are other csv files that you don't want to read in, you can restrict the specification:
f <- list.files(pattern="201\\d{5}\\.csv")
And to combine everything into one big data frame:
do.call(rbind, mydata)