Drop top row of dataframe and make the second row variable names

Drop top row of dataframe and make the second row variable names - r

I have list clean_data_2009 containing 12 monthly data frames named wireless_YY_mmm each, where YY represents year 2009 abbreviated as 09 and mmm abbreviates the calendar months.
I want to drop the first row in each of the 12 dataframes, and then convert the first row to variables name row. The command below works, but I want to write a loop instead.
clean_data_2009$wireless_jan_09 <- clean_data_2009$wireless_jan_09[-1,] %>% row_to_names(row_number = 1)
I have written the loop command to print the text that R should accept to manipulate the data frames using paste command, but R tries to read the paste command and thus gives me an error. I try to fix it with the get command, but still run into the error shown below -
month <- c("jan", "feb", "mar", "april", "may", "june", "july", "aug", "sep", "oct", "nov", "dec")
year <- c("09") # "2010", "2011"
list_dt <- c("clean_data_2009$wireless")
rows2del <- c("[-1, ]")
for (y in year) {
for (m in month) {
print(paste(y,m,sep = "_") )
print(paste(list_dt,m,y,sep = "_"))
print(paste(paste(list_dt,m,y,sep = "_"),rows2del, sep=""))
get(paste(list_dt,m,y,sep = "_")) <- get(paste(paste(list_dt,m,y,sep = "_"),rows2del, sep="")) %>% row_to_names(row_number = 1)
}
}
Error:
[1] "09_jan"
[1] "clean_data_2009$wireless_jan_09"
[1] "clean_data_2009$wireless_jan_09[-1, ]"
Error in get(paste(paste(list_dt, m, y, sep = "_"), rows2del, sep = "")) :
object 'clean_data_2009$wireless_jan_09[-1, ]' not found

This alternative approach might help. If you already have your frames in a list, you can just loop through them, and using indexing to drop the first row, and set the names in a single setNames() call for each frame
lapply(clean_data_2009, \(d) setNames(d[-1,],d[1,]))

Related

Automating importation and naming of a csv file

I am trying to import many csv files from an EPA website. The nomenclature of those csv files is sensible / consistent. Any suggestions on how I can use a loop to automate the importation of the csv files and their naming as dataframes within R?
Right now I'm doing it manually by swapping out the month name in each line of code as illustrated below:
library(tidyverse)
#Download 2013 data
jan_13<-read.csv("https://www.epa.gov/sites/default/files/2017-10/rindata_jan2013.csv")%>%
add_column("month"="jan","year"=2013)
feb_13<-read.csv("https://www.epa.gov/sites/default/files/2017-10/rindata_feb2013.csv")%>%
add_column("month"="feb","year"=2013)
mar_13<-read.csv("https://www.epa.gov/sites/default/files/2017-10/rindata_mar2013.csv")%>%
add_column("month"="mar","year"=2013)
apr_13<-read.csv("https://www.epa.gov/sites/default/files/2017-10/rindata_apr2013.csv")%>%
add_column("month"="apr","year"=2013)
may_13<-read.csv("https://www.epa.gov/sites/default/files/2017-10/rindata_may2013.csv")%>%
add_column("month"="may","year"=2013)
jun_13<-read.csv("https://www.epa.gov/sites/default/files/2017-10/rindata_june2013.csv")%>%
add_column("month"="jun","year"=2013)
jul_13<-read.csv("https://www.epa.gov/sites/default/files/2017-10/rindata_july2013.csv")%>%
add_column("month"="jul","year"=2013)
aug_13<-read.csv("https://www.epa.gov/sites/default/files/2017-10/rindata_aug2013.csv")%>%
add_column("month"="aug","year"=2013)
sep_13<-read.csv("https://www.epa.gov/sites/default/files/2017-10/rindata_sept2013.csv")%>%
add_column("month"="sep","year"=2013)
oct_13<-read.csv("https://www.epa.gov/sites/default/files/2017-10/rindata_oct2013.csv")%>%
add_column("month"="oct","year"=2013)
nov_13<-read.csv("https://www.epa.gov/sites/default/files/2017-10/rindata_nov2013.csv")%>%
add_column("month"="nov","year"=2013)
dec_13<-read.csv("https://www.epa.gov/sites/default/files/2017-10/rindata_dec2013.csv")%>%
add_column("month"="dec","year"=2013)
I'd like to set something up where all 12 months are imported, the added column is modified appropriately and the resulting df is named appropriately, by month.
Thanks for the help!

Read all of the csvs using a vector of months and string concatenation, then set their names, enframe, add a year column, and unnest:
months <- c("jan", "feb", "mar", "apr", "may", "june", "july", "aug", "sept", "oct", "nov", "dec")
dfs <- lapply(months, function(x) read.csv(paste0("https://www.epa.gov/sites/default/files/2017-10/rindata_", x, "2013.csv"))) %>%
setNames(months) %>%
enframe(name = "month") %>%
add_column(year = 2013) %>%
unnest(value)
Let me know if this works!

Can you use the output of a function with user input to "call" a dataframe?

I have some code in R that invites a user to put in a year between 2010 and 2021.
chosen.year <- readline(promt = "choose year between 2010 and 2021:")
y.chosen.year <- paste("y", chosen.year)
year.input <- gsub(" ", "", y.chosen.year, fixed = TRUE)
The output that is stored in year.input is for e.g. 2015: y2015.
I have a dataframe for each year between 2010 and 2021 that is called y2010, y2011 etc.
Is it possible to later use year.input in another function that would otherwhise require me to write y2015 (so that the user can choose a year that will be used later on)?
Example:
myspdf2 <- merge(myspdf1, year.input, by.x "abc", by .y "def")
Instead of:
myspdf2 <- merge(myspdf1, y2015, by.x "abc", by .y "def")
I tried the method above but it did not work.

Assuming promt= is not in your real code, two options:
Combine all years into one frame, including the year in the data (if not there already).
years <- ls(pattern = "^y\\d{4}$")
allyears <- Map(
function(x, yr) transform(x, year = yr),
mget(years), years)
subset(allyears, year == chosen.year)
Combine all years into a list of frames, and subset from there:
allyears <- mget(ls(pattern = "^y\\d{4}$"))
allyears[[ chosen.year ]]
(This assumes that a chosen.year will only reference one of the multiple frames.)
Ultimately I suspect that this is not about merge so much about subset (one-frame) or [[-extraction (list of frames).
A third option that I'm not fond of, but offered to round out the answer:
Just get the data. BTW, you should use either paste0(.) or paste(., sep=""), otherwise you'll get y 2015 instead of y2015. This is much more direct than paste(.) and gsub(" ", "", .).
year.input <- paste0("y", chosen.year)
get(year.input)

Substring a statement after character matching and year

I am trying to extract certain rows based on year from my dataset, furthermore I want to substring those rows matching the following conditions, for year 2017 I want to substring the the portion before the second '-' in the statment for eg: "17Q4-EMEA-All-SOV-OutR-Sov_Score-18Dec.Email" I would want only "All-SOV-OutR-Sov_Score-18Dec.Email" and for 2018 I want to remove the portion after the '.' for eg: "IVP Program Template.IVP Email Template" I want "IVP Program Template"
I have tried using
data$col <- sub(".*:", "", data$`Email Name`)
data$col2 <- substring(data$`Email Name`, regexpr(".", data$`Email Name`) + 1)
but none of it is working and returns the statements as is, also for filtering based on year I tried using the filter function
filter(data, as.Date(data$First Activity (EDT)) = "2017") but it gives me syntax error
My dataset is like this:

Here is the regex that should give you the desired result for 2017 values:
sub(".*?-.*?-", "", "17Q4-EMEA-All-SOV-OutR-Sov_Score-18Dec.Email")
# "All-SOV-OutR-Sov_Score-18Dec.Email"
The one for 2018 values:
sub("\\..*", "", "IVP Program Template.IVP Email Template")
# IVP Program Template
You can then apply the regex functions with ifelse:
library(lubridate)
data$email_adj <- NA
data$email_adj <- ifelse(year(mdy(data$`First Activity (EDT)`)) %in% "2017", sub(".*?-.*?-", "", data$`Email Name`), data$email_adj)
data$email_adj <- ifelse(year(mdy(data$`First Activity (EDT)`)) %in% "2018", sub("\\..*", "", data$`Email Name`), data$email_adj)
If you want to filter by month instead of year use the month instaed of the year function (in the example I only selected months from April until July):
library(lubridate)
data$email_adj <- NA
data$email_adj <- ifelse(month(mdy(data$`First Activity (EDT)`)) %in% 4:7, sub(".*?-.*?-", "", data$`Email Name`), data$email_adj)
data$email_adj <- ifelse(month(mdy(data$`First Activity (EDT)`)) %in% 4:7, sub("\\..*", "", data$`Email Name`), data$email_adj)

Do if then statement based on values inside data frame or vector

As an example, suppose I have this data:
key <- data.frame(num=c(1,2,3,4,5), month=c("January", "Feb", "March", "Apr", "May"))
data <- c(4,2,5,3)
I want to create a new vector, data2 using the mapping of num to month contained in key. I can do this manually using case_when by doing lots of if statements at once:
library(dplyr)
data2<-case_when(
data==1 ~ "January",
data==2 ~ "Feb",
data==3 ~ "March",
data==4 ~ "Apr",
data==5 ~ "May"
)
However, say that I want to automate this process (maybe I actually have thousands of if statements) and utilize the mapping contained in key. Is this, or something like it, possible?
Here is a failed attempt at code:
data2 <- case_when(data=key$num ~ key$month)
What I am going for is a vector called data2 with these elements: c("Apr","Feb","May","March"). How can I do this?

You can use match and base R indexing (also, set stringsAsFactors=FALSE when you initialize the data.frame, as I did below):
key <- data.frame(num=c(1,2,3,4,5), month=c("January", "Feb", "March", "Apr", "May"), stringsAsFactors = FALSE)
data2 <- key$month[match(data, key$num)]
data2
#[1] "Apr" "Feb" "May" "March"

Why are these object sizes different - R

Why do I get "warning longer object length is not a multiple of shorter object length"?
Forgive me for asking this again, but I am unable to figure out why I am getting this error message - even after combing through stackoverflow. From the above link it says:
"memb only has a length of 10. I'm guessing the length of dih_y2$MemberID isn't a multiple of 10. When using == it will spit out a warning if it isn't a multiple to let you know that it's probably not doing what you're expecting it is doing."
I'm am getting the same error message from the following code, but I am not sure what "objects" are of different length in my example and how to fix it! Essentially, I am trying to separate my dates into months for analysis. Please help if you can. Thank you.
library(ggplot2)
library(dplyr)
library(statsr)
piccolos2 <- piccolos2 %>%
mutate(SERPDate = as.Date(piccolosRankings$SERPDate, format='%m/%d/%Y'))
piccolos2 <- piccolos2 %>%
mutate(Month = ifelse(as.numeric(SERPDate) %in% 0017-04-01:0017-04-30, "April",
ifelse(as.numeric(SERPDate) %in% 0017-05-01:0017-05-31, "May",
ifelse(as.numeric(SERPDate) %in% 0017-06-01:0017-06-30, "June",
ifelse(as.numeric(SERPDate) %in% 0017-07-01:0017-07-31, "July", "August")))))

piccolos2 <- piccolos2 %>%
mutate(Month = ifelse(as.numeric(SERPDate) %in% as.Date("0017-04-01"):as.Date("0017-04-30"), "April",
ifelse(as.numeric(SERPDate) %in% as.Date("0017-05-01"):as.Date("0017-05-31"), "May",
ifelse(as.numeric(SERPDate) %in% as.Date("0017-06-01"):as.Date("0017-06-30"), "June",
ifelse(as.numeric(SERPDate) %in% as.Date("0017-07-01"):as.Date("0017-07-31"), "July", "August")))))

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Drop top row of dataframe and make the second row variable names - r

This alternative approach might help. If you already have your frames in a list, you can just loop through them, and using indexing to drop the first row, and set the names in a single setNames() call for each frame lapply(clean_data_2009, \(d) setNames(d[-1,],d[1,]))

Related

Automating importation and naming of a csv file

Can you use the output of a function with user input to "call" a dataframe?

Substring a statement after character matching and year

Do if then statement based on values inside data frame or vector

Why are these object sizes different - R

Categories

Resources