I have the following Tibbles.
tmp <- tibble()
tmp2 <- tibble()
tmp <- tmp %>% rbind( colSums( y_matrix) )
tmp2 <- tmp2 %>% rbind( proportions( colSums( y_matrix )))
data <- bind_cols(tmp,tmp2)
I want to add column names for "data" accordingly. The number of columns in tmp and tmp2 will change from time to time. So how can I add column names without defining them one by one?
The expected column names in the output is like this.
c1 c2 c1_prop c2_prop
Is there any method to create this?
I don't have enough reputation to comment this data.table solution, which you could could always send to as_tibble(). If this wasn't what you were after, could you put an explicit example of the data and expected output?
library(data.table)
setDT(data)
setnames(data, ncol(tmp)+(1:ncol(tmp2)), paste0(names(tmp),"_prop"))
However, wouldn't it just be better to name the columns correctly before merging?
Related
As always, apologies for the simple Q.
I've got a large dataset and want to change a specified list of columns into a numeric class. I can do it, but it's not very elegant and unless I change the memory requirements it won't run as the merge is too exhausts the vector memory!
library(tidyverse)
#Extract column names I want to turn into numeric from data
make_numeric <- data[252:321] %>% select(-c(contains("UNITS"))) %>% colnames()
Here I want to turn columns that are contained in make_numeric into as.numeric and insert straight back into data. I can't do this in one go, so instead I extract the data, convert, and then merge.
tmp <- data %>% select(record_id, make_numeric)
tmp <- lapply(tmp[2:56], as.numeric)
tmp <- as.data.frame(tmp)
tmp2 <- data %>% select(-make_numeric)
tmp3 <- merge(tmp, tmp2)
I'm certain there must be a better way...
There is a dplyr solution:
library(tidyverse)
library(dplyr)
#Extract column names I want to turn into numeric from data
make_numeric <- data[252:321] %>% select(-c(contains("UNITS"))) %>% colnames()
#Mutate desired columns to numeric
data <- data %>% mutate_at(vars(make_numeric), as.numeric)
Does this work?
library(data.table)
#convert to data.table
dt<- as.data.table(data)
#change colnames to numeric
dt[, colnames(dt)[colnames(dt) %in% cols] := lapply(.SD, as.numeric), .SDcols = colnames(dt)[colnames(dt) %in% cols]]
Say I download a dataset called econ_dmg.csv
fileUrl <- "https://raw.githubusercontent.com/MichaelSodeke/DataSets/main/econ_dmg.csv"
download.file(fileUrl, destfile="econ_dmg.csv", method = "curl", mode="wb")
df <- read_csv("econ_dmg.csv")
I then group the elements as such:
df2 <- df %>% group_by(state, max.net.loss)
Next I convert to a list with:
df3 <- group_split(df2)
Say I wanted to add a random dataframe into this list:
newDF <- data.frame(state=c("UT", "UT"), evtype=c("FLASH FLOOD", "WINTER STORM"), max.net.loss=c(900, 900))
newDF <- as_tibble(newDF)
Then let's say I wanted to add it after subset df3[[20]] in the list df3. I attempted the following, but failed:
df3 <- append(list(newDF), df3, after=df3[[20]][[3]])
Please explain what I did wrong in my approach to using the after= parameter in the append() function.
Many thanks.
If we need to add new rows to the data, use rbind from base R on the specific location of the list by extracting the element [[ and updating (<-) on the same list element
df3[[20]] <- rbind(df3[[20]], newDF)
I solved my own problem #Akrun. Many thanks.
Suppose I use:
df3[[20]] <- bind_rows(df3[[20]], newDF)
Then I unsplit groupings with do.call
df4 <- do.call(rbind, df3) %>% as.data.frame()
Then reorder by decreasing order:
df5 <- df4[order(df4$max.net.loss, decreasing=TRUE),]
Hello I want to apply dplyr arrange function on a column within a for loop, but for some reason it does not work. Here is a minimal example:
for (j in colnames(df1)[3:ncol(df1)]){
# create datframe for each column
t <- select(df1, all_of(j))
t %>% arrange(j)
var_list[[j]] <- t
for (i in var_list[[j]]$Timestep )
### arrange each timestep df by each colun once
scenario[[i+1]] <- var_list[[j]][min:max,]
# subset data to the scenarios of interest
}
I guess the Problem is that j delivers a character string "variable", but dplyr arrange requires it without "". I have tried as.name(), paste() and eval parse functions but neither of them worked. Any ideas? Thank you!
This seems to work :
df1 <- mtcars
var_list <- list()
for (j in colnames(df1)[3:ncol(df1)]){
# create datframe for each column
t <- select(df1, all_of(j))
var_list[[j]] <- t %>% arrange_at(1)
for (i in var_list[[j]]$Timestep )
### arrange each timestep df by each colun once
scenario[[i+1]] <- var_list[[j]][min:max,]
# subset data to the scenarios of interest
}
var_list
Unfortunately does not work. It should sort the column j in ascending order. Maybe arrange is not the best function to use - however order or sort do not work either. Cant get what I am doing wrong. However I found a work around using long data format:
for (j in colnames(df1)[3:ncol(df1)]){
# create datframe for each timestep
t <- select(df1, Trial, Timestep, all_of(j))
var_list[[j]] <- melt(t, id.vars=c("Trial", "Timestep"))
var_list[[j]] <- var_list[[j]] %>% arrange(value)
I'm trying to modify data frames and struggle with combining my operations into a for loop. I want to subset a data frame according to one particular column, attach different rows to each subset and combine the modified subsets into one single data frame again. Let's use the iris data as an example:
#Create data frame subsets based on Species column
iris_subs <- split(iris, iris$Species)
#create an empty data frame with the same columns as in iris and one empty row
emptydf <- iris[FALSE,]
emptydf[nrow(emptydf)+1,] <- NA
#create a data frame with sums for each species
iris %>% group_by(Species) %>% summarise_all(sum) -> iris_sums
iris_sums <- iris_sums[,-c(1)] #delete column with species names
#Combine data frames into one data frame with original data, sum for this species and an empty row for each subset
iris_setosa <- bind_rows(iris_subs[1], iris_sums[1,], emptydf)
iris_versicolor <- bind_rows(iris_subs[2], iris_sums[2,], emptydf)
iris_virginica <- bind_rows(iris_subs[3], iris_sums[3,], emptydf)
new_iris <- bind_rows(iris_setosa, iris_versicolor, iris_virginica)
This code does the job. However, I have a couple of hundreds of data frames which I want to process in this way and the number of different species varies for each data frame. How can I automate the last part in a for loop?
I would like something like this
#empty data frame to store output
new_iris <- iris[FALSE,]
for (i in iris_subs) {
new_iris[i] <- bind_rows(iris_subs[i], iris_sums[i,], emptydf)
new_iris <- merge(new_iris[i])
}
Error in iris_subs[i] : invalid subscript type 'list'
Apart from the error, this is probably way too simple... I'm an R beginner and have searched the net for days now, but cannot find any answer to this. Does anyone have a suggestion for how to achieve this? Thank you for any hints!
We can create a function and repeat it for all the dataframes. Here is a shorter version of what you were trying to do
library(dplyr)
repeat_process <- function(df) {
iris_sums <- df %>% group_by(Species) %>% summarise_all(sum) %>% select(-Species)
df %>% bind_rows(iris_sums, emptydf[rep(1:nrow(emptydf), n_distinct(df$Species)), ])
}
Now let's assume you have a list of dataframes
list_df <- list(iris, iris)
You can apply this function to each dataframe in the list
lapply(list_df, repeat_process)
You can define a function that will sum up all numeric columns of a data.frame, and leave other columns as NA, append this to original data frame:
numericCols = sapply(iris,is.numeric)
func = function(df,numCols){
iris_sums <- colSums(df[,numCols])
result <- rep(NA,ncol(df))
names(result) <- colnames(df)
result[names(iris_sums)] <- iris_sums
rbind(df,result,rep(NA,ncol(df)))
}
Then we use purrr to map each subset:
split(iris,iris$Species) %>% map_dfr(func,numCols=numericCols)
I have two separate datasets: one has the column headers and another has the data.
The first one looks like this:
where I want to make the 2nd column as the column headers of the next dataset:
How can I do this? Thank you.
In general you can use colnames, which is a list of your column names of your dataframe or matrix. You can rename your dataframe then with:
colnames(df) <- *listofnames*
Also it is possible just to rename one name by using the [] brackets.
This would rename the first column:
colnames(df2)[1] <- "name"
For your example we gonna take the values of your column. Try this:
colnames(df2) <- as.character(df1[,2])
Take care that the length of the columns and the header is identical.
Equivalent for rows is rownames()
dplyr way w/ reproducible code:
library(dplyr)
df <- tibble(x = 1:5, y = 11:15)
df_n <- tibble(x = 1:2, y = c("col1", "col2"))
names(df) <- df_n %>% select(y) %>% pull()
I think the select() %>% pull() syntax is easier to remember than list indexing. Also I used names over colnames function. When working with a dataframe, colnames simply calls the names function, so better to cut out the middleman and be more explicit that we are working with a dataframe and not a matrix. Also shorter to type.
You can simply do this :
names(data)[3]<- 'Newlabel'
Where names(data)[3] is the column you want to rename.