Converting List of Vectors to Data Frame in R - r

I'm trying to convert a list of vectors into a data frame, with there being a column for Company Names and column for the MPE. My list is generated by running the following code for each company:
MPE[[2]] <- c("Google", abs(((forecasted - goog[nrow(goog),]$close)
/ goog[nrow(goog),]$close)*100))
Now, i'm having trouble making it into the appropriate data frame for further manipulation. What's the easiest way to do this?
This is an example list of vectors that I would want to manipulate into a dataframe with the company names in one column and the number in the second column.
test <- list(c("Google", 2))
test[[2]] <- c("Microsoft", 3)
test[[3]] <- c("Apple", 4)

You can use unlist with matrix and then turn into a dataframe. reducing with rbind could take a long time with a large dataframe I think.
df <- data.frame(matrix(unlist(test), nrow=length(test), byrow=T))
colnames(df) <- c("Company", "MPE")

I was actually able to achieve what I wanted with the following:
MPE_df <- data.frame(Reduce(rbind ,MPE))
colnames(MPE_df) <- c("Company", "MPE")
MPE_df

Related

Creating json list of lists from dataframe

Very new to R, I have a data.frame of mixed types and need to convert it to a json object that has each row of the data.frame as a list within a list, with the column headers as the first list.
Closest I've come is the below,
library(jsonlite)
df <- data.frame(X=as.numeric(c(1,2,3)),
Y=as.numeric(c(4,5,6)),
Z=c('a', 'b', 'c'),
stringsAsFactors=FALSE)
test <- split(unname(df), 1:NROW(df))
toJSON(test)
Which gives,
{"1":[[1,4,"a"]],"2":[[2,5,"b"]],"3":[[3,6,"c"]]}
If there's some way to remove the keys and flatten the value list by one level I could make this work by adding the colnames, but is there an easier way I'm missing? Output I'd like is,
{[["X","Y","Z"],[1,4,"a"],[2,5,"b"],[3,6,"c"]]}
Thanks for any help!
The general idea is to get the json format you want your data needs to be in a list of vectors (2D Vectors and 2D lists do not work).
Hey here is one way, there is probably a more elegant one but this works (but it makes the numbers strings, I can't find away around that sorry).
library(rlist)
df <- data.frame(X=as.numeric(c(1,2,3)),
Y=as.numeric(c(4,5,6)),
Z=c('a', 'b', 'c'),
stringsAsFactors=FALSE)
#make the column names a row and then remove them
names <- colnames(df)
df[2:nrow(df)+1,] <- df
df[1,] <- names
colnames(df) <- NULL
#convert the df into a list containing vectors
data <- list()
for(i in seq(1,nrow(df))){
data <- list.append(data,as.vector(df[i,]))
}
toJSON(data)

Find difference of same column names across different data frames in a list in R

I have a list of data frames with same column names where each dataframe corresponds to a month
June_2018 <- data.frame(Features=c("abc","def","ghi","jkl"), Metric1=c(100,200,250,450), Metric2=c(1000,2000,5000,6000))
July_2018 <- data.frame(Features=c("abc","def","ghi","jkl"), Metric1=c(140,250,125,400), Metric2=c(2000,3000,2000,3000))
Aug_2018 <- data.frame(Features=c("abc","def","ghi","jkl"), Metric1=c(200,150,250,600), Metric2=c(1500,2000,4000,2000))
Sep_2018 <- data.frame(Features=c("abc","def","ghi","jkl"), Metric1=c(500,500,1000,100), Metric2=c(500,4000,6000,8000))
lst1 <- list(Aug_2018,June_2018,July_2018,Sep_2018)
names(lst1) <- c("Aug_2018","June_2018","July_2018","Sep_2018")
I intend to create a new column in each of the data frames in the list named Percent_Change_Metric1 and Percent_Change_Metric2 by doing below calculation
for (i in names(lst1)){
lst1[[i]]$Percent_Change_Metric1 <- ((lst1[[i+1]]$Metric1-lst1[[i]]$Metric1)*100/lst1[[i]]$Metric1)
lst1[[i]]$Percent_Change_Metric2 <- ((lst1[[i+1]]$Metric2-lst1[[i]]$Metric2)*100/lst1[[i]]$Metric2)
}
However, obviously the i in for loop is against the names(lst1) and wouldn't work
Also, the dataframes in my list in random order and not ordered by month-year. So the calculation to subtract successive dataframes' columns isn't entirely accurate.
Please advise
How I go about with adding the Percent_change_Metric1 and
Percent_change_Metric2
How to choose the dataframe corresponding
to next month to arrive at the correct Percent_Change
Thanks for the guidance
Here is one option with base R
lst1[-length(lst1)] <- Map(function(x, y)
transform(y, Percent_Change_Metric1 = (x$Metric1 - Metric1) * 100/Metric1,
Percent_Change_Metric2 = (x$Metric2 - Metric2) * 100/Metric2),
lst1[-1], lst1[-length(lst1)])

Can't reorder data frame columns by matching column names given in another column

I'm trying to re-order the variables of my data frame using the contents of a variable in another data frame but it's not working and I don't know why.
Any help would be appreciated!
# Starting point
df_main <- data.frame(coat=c(1:5),hanger=c(1:5),book=c(1:5),
bottle=c(1:5),wall=c(1:5))
df_order <- data.frame(order_var=c("wall","book","hanger","coat","bottle"),
number_var=c(1:5))
# Goal
df_goal <- data.frame(wall=c(1:5),book=c(1:5),hanger=c(1:5),
coat=c(1:5),bottle=c(1:5))
# Attempt
df_attempt <- df_main[df_order$order_var]
In you df_order, put stringsAsFactors = FALSE in the data.frame call.
The issue is that you have the order as a factor, if you change it to a character it will work:
df_goal <- df_main[as.character(df_order$order_var)]

Finding the closest character string in a second data frame in R

I have a quite big data.frame with non updated names and I want to get the correct names that are stored in another data.frame.
I am using stringdist function to find the closest match between the two columns and then I want to put the new names in the original data.frame.
I am using a code based on sapply function, as in the following example :
dat1 <- data.frame("name" = paste0("abc", seq(1:5)),
"value" = round(rnorm(5), 1))
dat2 <- data.frame("name" = paste0("abd", seq(1:5)),
"other_info" = seq(11:15))
dat1$name2 <- sapply(dat1$name,
function(x){
char_min <- stringdist::stringdist(x, dat2$name)
dat2[which.min(char_min), "name"]
})
dat1
However, this code is too slow considering the size of my data.frame.
Is there a more optimized alternative solution, using for example data.table R package?
First convert the data frames into data tables:
dat1 <- data.table(dat1)
dat2 <- data.table(dat2)
Then use the ":=" and "amatch" command to create a new column that approximately matches the two names:
dat1[,name2 := dat2[stringdist::amatch(name, dat2$name)]$name]
This should be much faster than the sapply function. Hope this helps!

R: t tests on rows of 2 dataframes

I have two dataframes and I would like to do independent 2-group t-tests on the rows (i.e. t.test(y1, y2) where y1 is a row in dataframe1 and y2 is matching row in dataframe2)
whats best way of accomplishing this?
EDIT:
I just found the format: dataframe1[i,] dataframe2[i,]. This will work in a loop. Is that the best solution?
The approach you outlined is reasonable, just make sure to preallocate your storage vector. I'd double check that you really want to compare the rows instead of the columns. Most datasets I work with have each row as a unit of observation and the columns represent separate responses/columns of interest Regardless, it's your data - so if that's what you need to do, here's an approach:
#Fake data
df1 <- data.frame(matrix(runif(100),10))
df2 <- data.frame(matrix(runif(100),10))
#Preallocate results
testresults <- vector("list", nrow(df1))
#For loop
for (j in seq(nrow(df1))){
testresults[[j]] <- t.test(df1[j,], df2[j,])
}
You now have a list that is as long as you have rows in df1. I would then recommend using lapply and sapply to easily extract things out of the list object.
It would make more sense to have your data stored as columns.
You can transpose a data.frame by
df1_t <- as.data.frame(t(df1))
df2_t <- as.data.frame(t(df2))
Then you can use mapply to cycle through the two data.frames a column at a time
t.test_results <- mapply(t.test, x= df1_t, y = df2_t, SIMPLIFY = F)
Or you could use Map which is a simple wrapper for mapply with SIMPLIFY = F (Thus saving key strokes!)
t.test_results <- Map(t.test, x = df1_t, y = df2_t)

Resources