I've got the following data frame:
Loans <- data.frame(
ID = c("215781","832567","721536"),
From = c("01-01-2023","04-15-2022","09-23-2021"),
End = c("05-02-2023","10-15-2023","12-23-2021"),
Type = c("Monthly","Quarterly","Monthly"))
I need to create another data frame which has for every intra-period of each Loan a row with the ID and the Date, this loop that I've made isn't right but gets the idea of what I wanted to do. It works for 1 row if you delete the loop part
library(bizdays)
Base <- data.frame("TM",today())
colnames(Base) <- c("TM","InterestDates")
for (i in Loans[i,]){
df <- as.data.frame(seq.Date(Loans$From,Loans$Until,by="month"))
colnames(df) <- "InterestDates"
df$TM <- Loans$TM
Base <- rbind(Base,df)
}
Something like this would be the expected output
ID | InterestDates
250414 | 2022-05-16
250414 | 2022-06-16
250414 | 2022-07-18
250414 | 2022-08-16
So I'm guessing you'd want something like this:
library(bizdays)
Loans <- data.frame(
ID = c("215781","832567","721536"),
From = c("01-01-2023","04-15-2022","09-23-2021"),
End = c("05-02-2023","10-15-2023","12-23-2021"),
Type = c("Monthly","Quarterly","Monthly"))
Base <- data.frame("ID" = character(),"InterestDates" = character())
Loans$From <- as.Date(Loans$From,format = "%m-%d-%y")
Loans$End <- as.Date(Loans$End,format = "%m-%d-%y")
for (i in 1:nrow(Loans)){
if(Loans$Type[i] == "Monthly"){
seq_dates <- seq.Date(Loans$From[i],Loans$End[i],by="month")
}else if(Loans$Type[i] == "Quarterly"){
seq_dates <- seq.Date(Loans$From[i],Loans$End[i],by="quarter")
}
df <- data.frame("ID" = rep(Loans$ID[i],length(seq_dates)),"InterestDates" = seq_dates)
Base <- rbind(Base,df)
}
There's several issues in your original code.
Base <- data.frame("TM",today()) makes a dataframe already with one entry, not an empty dataframe
The columns From and End of the Loans dataframe are not in date format that is necessary for the seq.Date command
Loans does not have a column TM, but I'm guessing from your output, you want the ID column anyway
Loans[i,] does not work since i does not exist - you cannot define i by i. Please look into how loops work in R
The loop index is never used inside the loop. If you want the i-th entry of a column of a dataframe, access it via Loans$From[i]
Also I'm not quite sure: Do you want by = "month" for every entry of your original dataframe? Or dependent on the Type column of the Loans dataframe?
Related
I am trying to create a large number of data frames in a for loop using the "assign" function in R. I want to use the colnames function to set the column names in the data frame. The code I am trying to emulate is the following:
county_tmax_min_df <- data.frame(array(NA,c(length(days),67)))
colnames(county_tmax_min_df) <- c('Date',sd_counties$NAME)
county_tmax_min_df$Date <- days
The code I have so far in the loop looks like this:
file_vars = c('file1','file2')
days <- seq(as.Date("1979-01-01"), as.Date("1979-01-02"), "days")
f = 1
for (f in 1:2){
assign(paste0('county_',file_vars[f]),data.frame(array(NA,c(length(days),67))))
}
I need to be able to set the column names similar to how I did in the above statement. How do I do this? I think it needs to be something like this, but I am unsure what goes in the text portion. The end result I need is just a bunch of data frames. Any help would be wonderful. Thank you.
expression(parse(text = ))
You can set the names within assign, like that:
file_vars = c('file1', 'file2')
days <- seq.Date(from = as.Date("1979-01-01"), to = as.Date("1979-01-02"), by = "days")
for (f in seq_along(file_vars)) {
assign(x = paste0('county_', file_vars[f]),
value = {
df <- data.frame(array(NA, c(length(days), 67)))
colnames(df) <- paste0("fancy_column_",
sample(LETTERS, size = ncol(df), replace = TRUE))
df
})
}
When in {} you can use colnames(df) or setNames to assign column names in any manner desired. In your first piece of code you are referring to sd_counties object that is not available but the generic idea should work for you.
I want to rearrange the data in the dataframe which is originally in following format
3 rows for one project
I have extracted this using "rvest" package
library(rvest)
library(plyr)
library(dplyr)
projects<-NULL
thepage = read_html("https://www.99acres.com/search/project/buy/residential/pune?search_type=QS&search_location=SH&lstAcn=NPSEARCH&lstAcnId=9753976212484323&src=CLUSTER&preference=S&city=19&res_com=R&selected_tab=3&isvoicesearch=N&keyword=pune&strEntityMap=IiI%3D&refine_results=Y&Refine_Localities=Refine%20Localities&action=%2Fdo%2Fquicksearch%2Fsearch&searchform=1&price_min=null&price_max=null")
table = data.frame(html_table(x = thepage, fill = TRUE))
table = as.data.frame(t(table))
ResidentialProjects<-rbind(projects,setNames(table, names(table)))
I want all the details about 1 project (Real Estate Project Name) in one row.
I tried making a code for it as
newdf<-data.frame(matrix(ncol = 10),stringsAsFactors = FALSE)
df=ResidentialProjects
projectName=""
count<-0
for(n in 1:nrow(df)){
if(df[n,]$V1!=projectName){
count = count+1
projectName=df[n,]$V1
newdf[count,c(1,2,3,4)]=df[n,c(1,2,3,4)]
newdf[count,c(5,6,7)]=df[n+1,c(2,3,4)]
newdf[count,c(8,9,10)]=df[n+2,c(2,3,4)]
}else{
print(n)
next
}
}
But its giving me a table of numbers like
Output newdf
what is the problem? or any better option??
I have the following .csv file:
https://drive.google.com/open?id=0Bydt25g6hdY-RDJ4WG41VFpyX1k
And I would like to be able to take the date and agent name(pasting its constituent parts) and append them as columns to the right of the table, up until it finds a different name and date, doing the same for the remaining name and date items, to get the following result:
The only thing I have been able to do with the dplyr package is the following:
library(dplyr)
library(stringr)
report <- read.csv(file ="test15.csv", head=TRUE, sep=",")
date_pattern <- "(\\d+/\\d+/\\d+)"
date <- str_extract(report[,2], date_pattern)
report <- mutate(report, date = date)
Which gives me the following result:
The difficulty I am finding is probably using conditionals in order make the script get the appropriate string and append it as a column at the end of the table.
This might be crude, but I think it illustrates several things: a) setting stringsAsFactors=F; b) "pre-allocating" the columns in the data frame; and c) using the column name instead of column number to set the value.
report<-read.csv('test15.csv', header=T, stringsAsFactors=F)
# first, allocate the two additional columns (with NAs)
report$date <- rep(NA, nrow(report))
report$agent <- rep(NA, nrow(report))
# step through the rows
for (i in 1:nrow(report)) {
# grab current name and date if "Agent:"
if (report[i,1] == 'Agent:') {
currDate <- report[i+1,2]
currName=paste(report[i,2:5], collapse=' ')
# otherwise append the name/date
} else {
report[i,'date'] <- currDate
report[i,'agent'] <- currName
}
}
write.csv(report, 'test15a.csv')
Strange question but how to do I filter such that all rows are returned for a dataframe? For example, say you have the following dataframe:
Pts <- floor(runif(20, 0, 4))
Name <- c(rep("Adam",5), rep("Ben",5), rep("Charlie",5), rep("Daisy",5))
df <- data.frame(Pts, Name)
And say you want to set up a predetermined filter for this dataframe, for example:
Ptsfilter <- c("2", "1")
Which you will then run through the dataframe, to get your new filtered dataframe
dffil <- df[df$Pts %in% Ptsfilter, ]
At times, however, you don't want the dataframe to be filtered at all, and in the interests of automation and minimising workload, you don't want to have to go back and remove/comment-out every instance of this filter. You just want to be able to adjust the Ptsfilter value such that no rows will be filtered out of the dataframe, when that line of code is run.
I have experimented/guesses with things like:
Ptsfilter <- c("")
Ptsfilter <- c(" ")
Ptsfilter <- c()
to no avail.
Is there a value I can enter for Ptsfilter that will achieve this goal?
You might need to define a function to do this for you.
filterDF = function(df,filter){
if(length(filter)>0){
return(df[df$Pts %in% filter, ])
}
else{
return(df)
}
}
I have 1000 files with column similar column names.for example :
df1
DATE PRICE CLOSE
df2
DATE PRICE CLOSE
and so on...
If I try to merge them based by date they do get merge but the columns have retained their old names and I want to rename them in a loop
so merge data set looks like this
Date Price Close PRICE CLOSE
I want something like
DATE PRICE1 CLOSE1 PRICE2 CLOSE2.
Is there any easy way to do it?
I have tried couple of things which is not giving me correct output
this is using plyr package:
mod_join = function(mypath){
filenames=list.files(path=mypath, full.names=TRUE)
datalist = lapply(filenames, function(x){read.csv(file=x,header=T)[,c('Date','High','Low')]})
join_all(datalist,by = "Date")
}
this is using merge command on all data frame:
merge2 = function(mypath){
filenames=list.files(path=mypath, full.names=TRUE)
datalist = lapply(filenames, function(x){read.csv(file=x,header=T)[,c('Date','High','Low')]})
Reduce(function(x,y) {merge(x,y,by.x= "Date",by.y = "Date",all=T)}, datalist)}
}
I tried using for loop by making the data frame lead then using each data frame to subset and merge subsequently but somehow its not subsetting the dataframes:
for (i in 1:1000){
data_subset <- sprintf('data_%d',i)
mydata_subset <- data.frame(,data_subset["Date"],data_subset["High"],data_subset["DayLow"])
obj_name <- paste('subset_Pricedata',i,sep ="_")
assign(obj_name,value = mydata_subset)
}
Any help will be great.
Thanks
Hopefully, this will do your job:
library(plyr)
df1 = rename(df1,c("PRICE"="PRICE1","CLOSE"="CLOSE1"))
df2 = rename(df2,c("PRICE"="PRICE2","CLOSE"="CLOSE2"))
new = merge(df1,df2,all=TRUE)
Please comment if you face any difficulties.
What about this approach?
It should be fast as it uses data.table and its fread-function
library(data.table)
merge2 <- function(mypath){
filenames <- list.files(path=mypath, full.names=TRUE)
fileslist <- lapply(filenames, function(nam){
# reads the file
file <- fread(nam)
setnames(file, 2, "price") # renames the second col to "price"
setnames(file, 3, "close") # third to "close"
return(file)
})
dat <- rbindlist(fileslist)
return(dat)
}
EDIT
I just realised that you want to merge your data instead of having it in the long format. What you can do is just add a variable with a name to the data.table "file" before returning the file by adding:
file[, varnam := nam]
and then cast the final data.table "dat" before returning it, using the reshape2 library and its dcast function.
I had a similar problem. Here's what I ended up using, although there is likely a cleaner way.
The function suffix_col_names will add a suffix to a subset of columns. I use this because I eventually merge week1 and week 2 data on columns 1-10.
#function called suffix_col_names
suffix_col_names<-function(your_df, start_col, end_col, your_str, your_sep){
for (i in start_col:end_col){
colnames(your_df)[i]<-paste(colnames(your_df)[i], sep=your_sep,your_str)
}
return(your_df)
}
#call function to rename columns in week1 and week2
week_1_data<-suffix_col_names(week1,11,24,"1",".")
week_2_data<-suffix_col_names(week2,11,24,"2",".")