how to loop two loops one in another - r

I have a list (lst1) which contain 10 data.frame. Each data.frame has a variable ID. I also have a IDlist. Is it a way that we can create a looping codes so I can generate an excel book which contain those 10 data, one in each sheet, with one match ID in IDlist?
The tricky part is we need to looping through the IDlist list as well as passing through the lst1. Any advice?
I have some codes that I wrote but it won't work. Hopefully it can give you some idea on what I want to do.
for (i in IDlist) {
# create a workbook
tempwb <- createWorkbook()
for(j in seq_along(lst1)){
# store the ID -specific subset of the dataset
data.subset <-lst1[[j]] %>% filter(ID == i)
# add worksheet
addWorksheet(tempwb, sheetName = lst1[[j]])} # I want the sheetname= dataname, what should I do? mine should be wrong
## How can I load subset to each sheet?
file.name <- paste0(i,".xlsx")
### save workbook
saveWorkbook(tempwb, paste0(output_dir,file.name), overwrite = TRUE)
}

library(tidyverse)
library(writexl)
lst1 <- list(data1 = mpg, data2 = mpg, data3 = mpg, `data4/\\bad name` = mpg)
# Remove any illegal characters from names:
names(lst1) <- names(lst1) %>%
stringr::str_replace_all("[:punct:]", " ")
IDlist <- mpg %>% pull(cyl) %>% unique
make_one_xlsx <- function(this_id){
lst1 %>% map(~filter(., cyl == this_id)) %>% write_xlsx(paste0("ID_", this_id, ".xlsx"))
}
IDlist %>% map(make_one_xlsx)

you should show some example of your data for better answers.
you could use xlsx library
library('xlsx')
for(i in seq_along(lst1)){
if(i == 1){
append = FALSE
} else{
append = TRUE
}
write.xlsx(lst1[i], file='file_name.xlsx',
sheetName='get your id(sheet) name', append=append)
}

Related

R: Read multiple sheets from a XLS file and rbind them with skip rows and setnames

I have multiple sheets in an excel file and I will like to row bind all of them into one single dataframe. The first sheet has 3 rows that I have to skip, which looks something like this.
unneededrow1
unneededrow2
unneededrow3
Date Category CatID Revenue
1/1/2022 Shop 1 1203
1/1/2022 Online 2 3264
2/1/2022 Shop 1 1423
2/1/2022 Online 2 2464
For Sheet2, Sheet3, Sheet4, and and onwards, I have data without column names, which is something like the following.
3/1/2022 Shop 1 2454
3/1/2022 Online 2 4333
4/1/2022 Shop 1 2234
4/1/2022 Online 2 4565
My initial approach was to set colnames = FALSE for all sheets and rbind them but this result in mismatch of data types. I have looked up and tried other solutions but still couldn't achieve what I need. Appreciate any help here and thanks in advance.
Lets have files e.g. doc1.xlsx, doc2.xlsx, doc3.xlsx and so on in the current working directory. Then you can the whole table like this:
library(tidyverse)
library(readxl)
tibble(path = list.files(".", ".xlsx")) %>%
mutate(
has_header = path == "doc1.xlsx",
data = path %>% map2(has_header, ~ {
if (.y) {
read_excel(.x, skip = 3)
} else {
read_excel(.x, col_names = c("Date", "Category", "CatID", "Revenue"))
}
})
) %>%
pull(data) %>%
bind_rows()
If you have multiple sheets in the same file you can do this instead:
library(tidyverse)
library(readxl)
path <- "data.xlsx"
tibble(sheet = excel_sheets(path)) %>%
mutate(
has_header = sheet == "Sheet1",
data = sheet %>% map2(has_header, ~ {
if (.y) {
read_excel(path, sheet = .x, skip = 3)
} else {
read_excel(path, sheet = .x, col_names = c("Date", "Category", "CatID", "Revenue"))
}
})
) %>%
pull(data) %>%
bind_rows()
Building up on #danloo's answer.
# Looping over files
# Load packages
library(tidyverse)
library(readxl)
# Transforming answer above into a function
importFunction <- function(path = "your/filepath/and/filename.xls"){
test <-tibble(sheet = excel_sheets(path)) %>%
mutate(
has_header = sheet == "Sheet1",
data = sheet %>% map2(has_header, ~ {
if (.y) {
read_excel(path, sheet = .x, skip = 3)
} else {
read_excel(path, sheet = .x, col_names = c("Date", "Category", "CatID", "Revenue"))
}
})
) %>%
pull(data) %>%
bind_rows()
return(test)
}
# Performing the loop
# List all files in a directory containing some filename pattern
filesList <- list.files(path=".", pattern=".xls", all.files=TRUE, full.names=TRUE) # Remember to change pattern argument as you see fit
# Create empty dataframe to store files' data, by initialising it with column names tied to empty vectors
df <- data.frame(Date=as.Date(character()), Category=character(), CatID=character(), Revenue=double(), stringsAsFactors=FALSE)
# Now we are gonna load every file in filesList
(for file in filesList)
{
dfFile = importFunction(file)
dfFile = as.data.frame(dfFile)
df = rbind(df, dfFile)
}
# Show df after loop
df.head()

Retain value from nested for loop

So basically I am trying the following loop:
rawData = read.csv(file = "SampleData.csv")
companySplit = split(rawData, rawData$Company)
NameOfCompany <- numeric()
DateOfOrder <- character()
WhichProducts <- numeric()
for (i in 1:length(companySplit)){
company_DateSplit = split(companySplit[[i]], companySplit[[i]]$Date)
for (j in 1:length(company_DateSplit)){
WhichProducts[j] <- (paste0(company_DateSplit[[j]]$ID, collapse=","))
DateOfOrder[j] <- (paste0(company_DateSplit[[j]]$Date[1]))
NameOfCompany[j] <- (paste0(companySplit[[i]]$Company[[1]]))
}
}
df <- data.frame(NameOfCompany,DateOfOrder, WhichProducts)
write.csv(df, file = "basket.csv")
If you check basket.csv there is output for only company D. It is not writing because of nesting of for loops I guess. I am not able to get out of it.
I need exact output as basket.csv but for all companies.
Here are the CSVs:
Input Data: Link
Output of code basket.csv: Link
The output should look like this:
Company,Date, All IDs comma seperated.
e.g.
A,Jan-18,(1,2,4)
A,Feb-18,(1,4)
B,Jan-18,(2,3,4)
I'm able to get it from the above code. But Not able to save it in CSV for all A,B,C,D companies. It saves values for only company D which is the last value in looping. (check output file link)
The initial error is that you import your data without the parameter stringsAsFactors = FALSE which happens all the time. Also, looping in R is usually less efficient and harder to reason about than using a more functional approach. I think what you're trying to do can be done with the aggregate function
rawData <- read.csv(file = "SampleData.csv", stringsAsFactors = FALSE)
df <- aggregate(ID ~ Company + Date, data = rawData, FUN = paste, collapse = ",")
colnames(df) <- c("NameOfCompany", "DateOfOrder", "ID")
df = split(df, df$NameOfCompany)
Or using a tidy approach
df <- rawData %>% group_by(Company, Date) %>%
summarise(WhichProducts=paste(ID,collapse=',')) %>%
rename(DateOfOrder = Date) %>%
rename(NameOfCompany = Company) %>%
group_split()

How to scrape over a loop from a database?

I am trying to scrape data from a database that doesn't allow downloading directly. I have been able to scrape data from a single species but I am trying to do it for 159 species. This is why I wanted to create a loop that could be helpful
test <- data.frame(site = c("url=1",
"url=2"),
html.node = "td.DataText", stringsAsFactors = F)
library(rvest)
# an empty list, to fill with the scraped data
empty_list <- list()
for (i in 1:nrow(test)){
datatext <- pubs[i, 1]
datatext2 <- pubs[i, 2]
# scrape it!
empty_list[[i]] <- read_html(datatext) %>% html_nodes(datatext2) %>% html_text()
}
names(empty_list) <- test$site
empty <- as.data.frame(empty_list)
This is what I've tried so far. This is only for 2 species as indicated by FID=1 and FID=2 in the URL. There are 159 species. This is why I wanted a for loop that goes from 1:159 and populates the dataframe as it does with this current code.
I was able to figure it out!
url="url=1"
webpage <- read_html(url)
Data.Label <- webpage %>%
html_nodes("td.DataLabel") %>%
html_text()
Label <- as.data.frame(t(Data.Label))
#Obtains the data labels in a dataframe that is tranposed.
Data.Text <- lapply(paste0('url=', 1:159),
function(url){
url %>% read_html() %>%
html_nodes("td.DataText") %>%
html_text()
})
#Creates a list of all the data text needed to populate the table
Eco.Table <- as.data.frame(Data.Text)
#Convert list into dataframe.
Eco.Table <- Eco.Table[-c(39:42), ]
#Remove irrelevant rows
Eco.Table <- as.data.frame(t(Eco.Table))
#Transpose the dataframe into rows
rownames(Eco.Table) <- NULL
colnames(Eco.Table) <- as.character(unlist(Label))
#Reset row names and add column labels

pulling multiple entries in xml with different data using R

I have a set of XML files that I am reading in, and wanted to know the best way to deal with the following:
<MyDecision>
<Decision>
<DecisionID>X1234</DecisionID>
<DecisionReasons xmlns:a="http://schemas.datacontract.org/2004/07/Contracts">
<a:Reason>
<a:Description>DOBMismatch</a:Description>
</a:Reason>
<a:Reason>
<a:Description>PrimaryChecksFail</a:Description>
</a:Reason>
<a:Reason>
<a:Description>IncomeReferral</a:Description>
</a:Reason>
</DecisionReasons>
</Decision>
</MyDecision>
At the moment, I am running some R code but get the response:
Error: Duplicate identifiers for rows (2, 3, 4)
The intended output is a dataframe that looks something like:
fieldname |contents
MyDecision_Decision_DecisionID |X1234
MyDecision_Decision_DecisionReasons_Reason_Description_DOBMismatch |DOBMismatch
MyDecision_Decision_DecisionReasons_Reason_Description_PrimaryChecksFail |PrimaryChecksFail
MyDecision_Decision_DecisionReasons_Reason_Description_IncomeReferral |IncomeReferral
My current code is as below:
library(profvis)
library(XML)
library(xml2)
library(plyr)
library(tidyverse)
library(reshape2)
library(foreign)
library(rio)
setwd('c:/temp/xml/t')
df <- data.frame()
transposed.df1 <- data.frame()
allxmldata <- data.frame()
inputfiles <- as.character('test.xml')
findchildren<-function(nodes, df) {
numchild <- sapply(nodes, function(x){length(xml_children(x))})
xmlvalue <- xml_text(nodes[numchild==0])
xmlname <- xml_name(nodes[numchild==0])
xmlpath <- sapply(nodes[numchild==0], function(x) {gsub(', ','_', toString(rev(xml_name(xml_parents(x)))))})
if (isTRUE(xmlpath == 'MyDecision_Decision_DecisionReasons_Reason')) {
fieldname <- paste(xmlpath,xmlname,xmlvalue,sep = '_')
} else {
fieldname <- paste(xmlpath,xmlname,sep = '_')
}
contents <- sapply(xmlvalue, function(f){is.na(f)<-which(f == '');f})
dftemp <- data.frame(fieldname, contents)
df <- rbind(df, dftemp)
print(dim(df))
if (sum(numchild)>0){
findchildren(xml_children(nodes[numchild>0]), df) }
else{ return(df)}
}
for (x in inputfiles) {
df1 <- findchildren(xml_children(read_xml(x)),df)
xml.df1 <- data.frame(spread(df1, key = fieldname, value = contents), fix.empty.names = TRUE)
allxmldata <- rbind.fill(allxmldata,xml.df1)
}
I hope that there is someone that can point out what I have done wrong...
I came up with the following solution that works for your exemplary dataset. Hopefully, this approach can also be used for your larger dataset.
# we need these two packages
library(xml2)
library(tidyverse)
# read in the xml-file
xml <- read_xml(
'<MyDecision>
<Decision>
<DecisionID>X1234</DecisionID>
<DecisionReasons xmlns:a="http://schemas.datacontract.org/2004/07/Contracts">
<a:Reason>
<a:Description>DOBMismatch</a:Description>
</a:Reason>
<a:Reason>
<a:Description>PrimaryChecksFail</a:Description>
</a:Reason>
<a:Reason>
<a:Description>IncomeReferral</a:Description>
</a:Reason>
</DecisionReasons>
</Decision>
</MyDecision>'
)
xml %>%
# first find all nodes that doesn't contain any child nodes
xml_find_all(".//node()[not(node())]") %>%
# find the parent of each node
map(xml_parent) %>%
# extract name and text of each of the childless nodes
map(~list(name = xml_name(.x), text = xml_text(.x))) %>%
# bind rows.
bind_rows()
This produces the following output:
# A tibble: 4 x 2
name text
<chr> <chr>
1 DecisionID X1234
2 Description DOBMismatch
3 Description PrimaryChecksFail
4 Description IncomeReferral

how to write function within function to save objects as separated csv file?

I have a function as follows:
v2_city <-
function(city_name){
result <- v2[city == city_name]
result
}
v2_city is a dataset that has columns of "city_name", "offers".
I would like to create a filter function to save the filtered data separated as objects and ultimately save them as csv file.
To do so, I created city names in list and wanted to use the for loop as follows:
list <- c(
'Osaka'
,'Paris'
,'Roma'
,'Barcelona'
,'Fukuoka'
,'Hong Kong')
for (item in list){
x <- v2[item]
}
This gives x file filtered as HongKong. How do I save all separate files as objects and then write them into csv within the loop?
how about?
library(dplyr)
library(readr)
my_dataset = tibble(city = c("Osaka","Paris","Roma","Barcelona","Fukuoka","Hong Kong"), value = 1:6)
cities = c("Osaka","Paris","Roma","Barcelona","Fukuoka","Hong Kong")
for (j in 1:length(cities){
my_dataset %>%
filter(city == cities[[j]]) %>%
write_csv(paste0(cities[[j]],".csv"))
}

Resources