Generate column names dynamically for a dataframe in R - r

So, I am coverting a json into dataframe using and I'm successful in doing that. Below is my code:
df <- data.frame(t(sapply(json, c)))
colnames(df) <- gsub("X", "y",colnames(df))
So, it gives me column names like y1,y2,y3 etc. Is it possible if I could have these column names generated from 0 instead. So, the column names should be like y0,y1,y2 etc.

From the comments:
df <- data.frame(t(sapply(json,c))
colnames(df) <- paste0("y", 0:(ncol(df)-1))
Or if you want padded zeros
a <- seq(0,ncol(df)-1,1)
colnames(df) <- sprintf("y%02d",a)

Related

How do I rename a single column in multiple dataframes to the name of the dataframe in which they reside in R?

I am currently trying to rename a single column in multiple dataframes to match the dataframe name in R.
I have seen some questions/solutions on the site that are similar to what I am attempting to do, but none appear to do this dynamically. I have over 45 dataframes I need rename a column in, so manually typing in each individual name is doable, but time consuming.
Dataframe1 <- column
Dataframe2 <- column
Dataframe3 <- column
I want it to look like this:
Dataframe1 <- Dataframe1
Dataframe2 <- Dataframe2
Dataframe3 <- Dataframe3
The ultimate goal is to have a master dataframe with columns Dataframe1, Dataframe2, and Dataframe3
We can get all the datasets into a list and rename at once in the list
lst1 <- lapply(mget(ls(pattern = "Dataframe\\d+")), function(x) {
names(x)[5] <- "newcol"
x})
Update
If we are renaming the columns in different datasets with different names, then create a vector of columns names that corresponds to each 'Dataframe' column name
nm1 <- c("col5A", "col5B", "col5C", ..., "col5Z")
lst2 <- Map(function(x) {names(x)[5] <- y; x},
mget(ls(pattern = "Dataframe\\d+")),
nm1)
In the above code, we are renaming the 5th column to 'newcol'.
It can also be done using tidyverse
library(dplyr)
library(purrr)
map(mget(ls(pattern = "Dataframe\\d+")), ~ .x %>%
rename_at(5, ~ "newcol"))

Creating json list of lists from dataframe

Very new to R, I have a data.frame of mixed types and need to convert it to a json object that has each row of the data.frame as a list within a list, with the column headers as the first list.
Closest I've come is the below,
library(jsonlite)
df <- data.frame(X=as.numeric(c(1,2,3)),
Y=as.numeric(c(4,5,6)),
Z=c('a', 'b', 'c'),
stringsAsFactors=FALSE)
test <- split(unname(df), 1:NROW(df))
toJSON(test)
Which gives,
{"1":[[1,4,"a"]],"2":[[2,5,"b"]],"3":[[3,6,"c"]]}
If there's some way to remove the keys and flatten the value list by one level I could make this work by adding the colnames, but is there an easier way I'm missing? Output I'd like is,
{[["X","Y","Z"],[1,4,"a"],[2,5,"b"],[3,6,"c"]]}
Thanks for any help!
The general idea is to get the json format you want your data needs to be in a list of vectors (2D Vectors and 2D lists do not work).
Hey here is one way, there is probably a more elegant one but this works (but it makes the numbers strings, I can't find away around that sorry).
library(rlist)
df <- data.frame(X=as.numeric(c(1,2,3)),
Y=as.numeric(c(4,5,6)),
Z=c('a', 'b', 'c'),
stringsAsFactors=FALSE)
#make the column names a row and then remove them
names <- colnames(df)
df[2:nrow(df)+1,] <- df
df[1,] <- names
colnames(df) <- NULL
#convert the df into a list containing vectors
data <- list()
for(i in seq(1,nrow(df))){
data <- list.append(data,as.vector(df[i,]))
}
toJSON(data)

Converting List of Vectors to Data Frame in R

I'm trying to convert a list of vectors into a data frame, with there being a column for Company Names and column for the MPE. My list is generated by running the following code for each company:
MPE[[2]] <- c("Google", abs(((forecasted - goog[nrow(goog),]$close)
/ goog[nrow(goog),]$close)*100))
Now, i'm having trouble making it into the appropriate data frame for further manipulation. What's the easiest way to do this?
This is an example list of vectors that I would want to manipulate into a dataframe with the company names in one column and the number in the second column.
test <- list(c("Google", 2))
test[[2]] <- c("Microsoft", 3)
test[[3]] <- c("Apple", 4)
You can use unlist with matrix and then turn into a dataframe. reducing with rbind could take a long time with a large dataframe I think.
df <- data.frame(matrix(unlist(test), nrow=length(test), byrow=T))
colnames(df) <- c("Company", "MPE")
I was actually able to achieve what I wanted with the following:
MPE_df <- data.frame(Reduce(rbind ,MPE))
colnames(MPE_df) <- c("Company", "MPE")
MPE_df

Subsetting efficiently on multiple columns and rows

I am trying to subset my data to drop rows with certain values of certain variables. Suppose I have a data frame df with many columns and rows, I want to drop rows based on the values of variables G1 and G9, and I only want to keep rows where those variables take on values of 1, 2, or 3. In this way, I aim to subset on the same values across multiple variables.
I am trying to do this with few lines of code and in a manner that allows quick changes to the variables or values I would like to use. For example, assuming I start with data frame df and want to end with newdf, which excludes observations where G1 and G9 do not take on values of 1, 2, or 3:
# Naive approach (requires manually changing variables and values in each line of code)
newdf <- df[which(df$G1 %in% c(1,2,3), ]
newdf <- df[which(newdf$G9 %in% c(1,2,3), ]
# Better approach (requires manually changing variables names in each line of code)
vals <- c(1,2,3)
newdf <- df[which(df$G1 %in% vals, ]
newdf <- df[which(newdf$G9 %in% vals, ]
If I wanted to not only subset on G1 and G9 but MANY variables, this manual approach would be time-consuming to modify. I want to simplify this even further by consolidating all of the code into a single line. I know the below is wrong but I am not sure how to implement an alternative.
newdf <- c(1,2,3)
newdf <- c(df$G1, df$G9)
newdf <- df[which(df$vars %in% vals, ]
It is my understanding I want to use apply() but I am not sure how.
You do not need to use which with %in%, it returns boolean values. How about the below:
keepies <- (df$G1 %in% vals) & (df$G9 %in% vals)
newdf <- df[keepies, ]
Use data.table
First, melt your data
library(data.table)
DT <- melt.data.table(df)
Then split into lists
DTLists <- split(DT, list(DT[1:9])) #this is the number of columns that you have.
Now you can operate on the lists recursively using lapply
DTresult <- lapply(DTLists, function(x) {
...
}

R move named column to the end of a data frame

I'm trying to move a column to the end of a data frame and I'm struggling
output_index <- grep(output, names(df))
df <- cbind(df[,-output_index], df[,output_index])
This orders the data properly, however it converts the data to a matrix which doesn't work. How can I do this without losing the column names and keeping the data as a data frame.
Didn't need the , in front of the index:
output_index <- grep(output, names(df))
df <- cbind(df[-output_index], df[output_index])
df <- data.frame(id=1:10, output=rnorm(10,1,1), input=rnorm(10,1,1))
output_index <- grep("output", names(df))
res.df <- cbind(df[,-output_index], df[,output_index])

Resources