I'm sure there are much better ways of doing this, I'm open to suggestions.
I have these vectors:
vkt1 <- c("df1", "df2", "df3")
vector2 <- paste("sample", wSheatx, sep="_")
The first vector contains a list of the names of dataframes stored in the environment. These are stored as strings, but I'd like to call them as variable names.
The second vector is just the first one adding "sample" at the beggining, equivalent to:
vector2 <- c('sample_df1', 'sample_df2', 'sample_df3')
These strings from vector2 would serve as the names of new data frames to be created.
Alrighty, so now I want to do something like this:
for (i in 1:length(vector){ # meaning for i in 1,2,3
vector2[i] = data.frame(which(eval(parse(text = vkt1[i])) == "Some_String", arr.ind=TRUE))
addStyle(wb, vkt1[i], cols = 1:ncol(eval(parse(text = vkt1[i]))), rows = vector2[[i]][,1]+1, style = duppedStyle, gridExpand = TRUE)
}
It may look complicated, but the idea is to make a data frames named as the strings contained in vector2, being a subset of the data frames from vkt1 when "Some_String" is found.
Then, use that created data frame and add a style to the entire row when said string is present.
vector2[[i]][,1]+1 is intended to deploy as sample_df1[,1]+1 (in the first iteration)
Note that I'm using eval(parse(text = vkt1[i])) to get the variables from the strings of vkt1. So, say, eval(parse(text = vkt1[1])) is equal do df1 (the data frame, not the string)
Like this, the code gives the following error:
In file(filename, "r") :
cannot open file 'noCoinColor_Concat': No such file or directory
Been trying to get it working like so, but I'm beginning to feel this approach might be very wrong.
It is easier to manage code and data when you keep them in a list instead of separate dataframes.
You can use mget to get all the dataframes in vkt1 in a string and let's say you want to search for 'Some_String' in the first column of each dataframe, so you can do :
new_data <- lapply(mget(vkt1), function(df) df[df[[1]] == 'Some_String', ])
I haven't included the addStyle code here because I don't know from which package it is and what it does but you can easily include it in lapply's anonymous function.
Is it not easier to combine your data frames into a list and then use apply or map family functions to adjust your data frames?
data(mtcars)
df1 <- mtcars %>% filter(cyl == 4)
df2 <- mtcars %>% filter(cyl == 6)
df3 <- mtcars %>% filter(cyl == 8)
df_old_names <- c("df1", "df2", "df3")
df_new_names <- c("df_cyl_4", "df_cyl_6", "df_cyl_8")
df_list <- lapply(df_old_names, get)
names(df_list) <- df_new_names
Related
I have an object that contains list of lab tests and based on the length of the object, I have created a FOR loop that processes scripts. During each loop, R should create a data frame using list in that object. Please see below.
adlb <- data.frame(subjid = c(1:20), aval = c(100:119))
adlb$paramcd <- ifelse(adlb$subjid <= 10, "ALT", "AST")
lab_list <- unique(filter(adlb, !is.na(aval))$paramcd)
for (i in 1:length(lab_list))
{
lab_name <- unlist(lab_list)[[i]]
print(lab_name)`
**???** <- adlb %>%
dplyr::filter(paramcd == lab_name) %>%
drop_na(aval)
}
When I run it, it should first create data frame named ALT followed by AST. What should I replace ??? with?
Only reason why I would prefer it this way is because it helps me to review data in question and debug scripts when needed.
Thank you in advance.
I tried lab_name[[i]] and few other options but it resulted in either error or incorrect data frame name.
I think this might help:
# example dataframes
df1 <- iris
df2 <- mtcars
df3 <- iris
#put them into list
mylist <- list(df1,df2,df3)
#give names to list
names(mylist) <- c("df_name1","df_name2","df_name3")
#put dataframes into global env
list2env(mylist ,.GlobalEnv)
So, I have 6 data frames, all look like this (with different values):
Now I want to create a new column in all the data frames for the country. Then I want to convert it into a long df. This is how I am going about it.
dlist<- list(child_mortality,fertility,income_capita,life_expectancy,population)
convertlong <- function(trial){
trial$country <- rownames(trial)
trial <- melt(trial)
colnames(trial)<- c("country","year",trial)
}
for(i in dlist){
convertlong(i)
}
After running this I get:
Using country as id variables
Error in names(x) <- value :
'names' attribute [5] must be the same length as the vector [3]
That's all, it doesn't do the operations on the data frames. I am pretty sure I'm taking a stupid mistake, but I looked online on forums and cannot figure it out.
maybe you can replace
trial$country <- rownames(trial)
by
trial <- cbind(trial, rownames(trial))
Here's a tidyverse attempt -
library(tidyverse)
#Put the dataframes in a named list.
dlist<- dplyr::lst(child_mortality, fertility,
income_capita, life_expectancy,population)
#lst is not a typo!!
#Write a function which creates a new column with rowname
#and get's the data in long format
#The column name for 3rd column is passed separately (`col`).
convertlong <- function(trial, col){
trial %>%
rownames_to_column('country') %>%
pivot_longer(cols = -country, names_to = 'year', values_to = col)
}
#Use `imap` to pass dataframe as well as it's name to the function.
dlist <- imap(dlist, convertlong)
#If you want the changes to be reflected for dataframes in global environment.
list2env(dlist, .GlobalEnv)
Have been researching this question on SO, and found only solutions for merging list elements into one large data frame. However, I am struggling with unpacking only those elements that meet certain condition.
df1 <- iris %>% filter(Sepal.Length > 2.5)
df2 <- mtcars %>% filter(qsec > 16)
not_neccessary <- head(diamonds, 10)
not_neccessary2 <- head(beaver1, 12)
data_lists <- list("#123 DATA" = df1, "CON" = not_neccessary2, "#432 DATA" = df2, "COM" = not_neccessary)
My goal is to convert only those list elements that contain "DATA" in their name. I was thinking about writing a loop function within a lapply:
a <- lapply(data_lists, function(x){if (x == "#+[1-9]+_+DATA"){new_df <- as.data.frame(x)}})
It does not work. Also was trying to make a for loop:
for (i in list){
if (i == "#+[1-9]+_+DATA"){
df <- i
}
}
It does not work neither.
Is there any effective function that will unpack my list into particular dataframes by certain condition? My R skills are very bad, especially in writing functions, although I am not really new to this language. Sorry about that.
Use grepl/grep to find lists that have 'DATA' in their name and subset the list.
result <- data_lists[grepl('DATA', names(data_lists))]
#With `grep`
#result <- data_lists[grep('DATA', names(data_lists))]
Using %like%
result <- data_lists[names(data_lists) %like% 'DATA']
I have the following dataframes that are stored in a list as a result of using the map() function:
How can I extract the six dataframes from the list? I would like to do this because I would like to give each column a different name of the dataframe and then store all data in a csv file? Or do I not have to extract the dfs from the list then?
I am not sure about what you are exactly looking for, so below are something just from guessing your objective:
If you want to extract the data frame as objects in your global environment, then you can do like this:
list2env(setNames(dats1,paste0("df",seq(dats1))),envir = .GlobalEnv)
Assuming you are giving names "col1" and"col2" to two columns of different data frames in your list, maybe this can help you
dats1 <- lapply(dats1, setNames, c("col1","col2"))
You have a few options
Fake data
library(tidyverse)
df <- tibble(a = 1:9,b = letters[1:9])
x <- list(df,df,df,df)
You can bind dfs and create just one
bind_rows(x)
You can execute your logic on all dfs
logic <- . %>%
mutate(c = a*3)
x %>% map(logic)
You can can also name the dfs inside the list
names(x) <- letters[1:4]
bind_rows(x,.id = "id")
I have more than one hundred excel files need to clean, all the files in the same data structure. The code listed below is what I use to clean a single excel file. The files' name all in the structure like 'abcdefg.xlsx'
library('readxl')
df <- read_excel('abc.xlsx', sheet = 'EQuote')
# get the project name
project_name <- df[1,2]
project_name <- gsub(".*:","",project_name)
project_name <- gsub(".* ","",project_name)
# select then needed columns
df <- df[,c(3,4,5,8,16,17,18,19)]
# remane column
colnames(df)[colnames(df) == 'X__2'] <- 'Product_Models'
colnames(df)[colnames(df) == 'X__3'] <- 'Qty'
colnames(df)[colnames(df) == 'X__4'] <- 'List_Price'
colnames(df)[colnames(df) == 'X__7'] <- 'Net_Price'
colnames(df)[colnames(df) == 'X__15'] <- 'Product_Code'
colnames(df)[colnames(df) == 'X__16'] <- 'Product_Series'
colnames(df)[colnames(df) == 'X__17'] <- 'Product_Group'
colnames(df)[colnames(df) == 'X__18'] <- 'Cat'
# add new column named 'Project_Name', and set value to it
df$project_name <- project_name
# extract rows between two specific characters
begin <- which(df$Product_Models == 'SKU')
end <- which(df$Product_Models == 'Sub Total:')
## set the loop
in_between <- function(df, start, end){
return(df[start:end,])
}
dividers = which(df$Product_Models %in% 'SKU' == TRUE)
df <- lapply(1:(length(dividers)-1), function(x) in_between(df, start =
dividers[x], end = dividers[x+1]))
df <-do.call(rbind, df)
# remove the rows
df <- df[!(df$Product_Models %in% c("SKU","Sub Total:")), ]
# remove rows with NA
df <- df[complete.cases(df),]
# remove part of string after '.'
NeededString <- df$Product_Models
NeededString <- gsub("\\..*", "", NeededString)
df$Product_Models <- NeededString
Then I can get a well structured datafram.Well Structured Dataframe Example
Can you guys help me to write a code, which can help me clean all the excel files at one time. So, I do not need to run this code hundred times. Then, aggregating all the files into a big csv file.
You can use lapply (base R) or map (purrr package) to read and process all of the files with a single set of commands. lapply and map iterate over a vector or list (in this case a list or vector of file names), applying the same code to each element of the vector or list.
For example, in the code below, which uses map (map_df actually, which returns a single data frame, rather than a list of separate data frames), file_names is a vector of file names (or file paths + names, if the files aren't in the working directory). ...all processing steps... is all of the code in your question to process df into the form you desire:
library(tidyverse) # Loads several tidyverse packages, including purrr and dplyr
library(readxl)
single_data_frame = map_df(file_names, function(file) {
df = read_excel(file, sheet="EQUOTE")
... all processing steps ...
df
}
Now you have a single large data frame, generated from all of your Excel files. You can now save it as a csv file with, for example, write_csv(single_data_frame, "One_large_data_frame.csv").
There are probably other things you can do to simplify your code. For example, to rename the columns of df, you can use the recode function (from dplyr). We demonstrate this below by first changing the names of the built-in mtcars data frame to be similar to the names in your data. Then we use recode to change a few of the names:
# Rename mtcars data frame
set.seed(2)
names(mtcars) = paste0("X__", sample(1:11))
# Look at data frame
head(mtcars)
# Recode three of the column names
names(mtcars) = recode(names(mtcars),
X__1="New.1",
X__5="New.5",
X__9="New.9")
Or, if the order of the names is always the same, you can do (using your data structure):
names(df) = c('Product_Models','Qty','List_Price','Net_Price','Product_Code','Product_Series','Product_Group','Cat')
Alternatively, if your Excel files have column names, you can use the skip argument of read_excel to skip to the header row before reading in the data. That way, you'll get the correct column names directly from the Excel file. Since it looks like you also need to get the project name from the first few rows, you can read just those rows first with a separate call to read_excel and use the range argument, and/or the n_max argument to get only the relevant rows or cells for the project name.