finding the mean of columns in a data frame in R - r

I have a vector that contains 50 data frames of re-sampled data. So all of the column names are consistent in each data frame but the numeric values are different. Each data frame consists of 12 rows. How can I find the mean value of each row in one particular column between the 50 data frames and place the 12 mean values into a new one column data frame?

If you want the mean of a specific column that exists across your list of dataframes into a dataframe of its own, you can use dplyr and purrr.
library(dplyr)
library(purrr)
map2_df(your_list, "column_name", ~summarize_at(.x, .y, mean))

Related

R _ Make combinations of multiple dataframes & columns of different size

My first question here...
I have 2 dataframes, both with a different number of rows.
The first one has 3 columns, the second one has 1 column.
I want to make all combinations of values from the 1st column of the 1st dataframe with values in the 1st (and only) column of the second dataframe, and values of 2nd column of 1st dataframe with values in 1st (and only) column of second dataframe, and so on...
I assume the result will be a one-column dataframe (?).
Something like this:
Attempts with combn did not help me yet...
Thanks!
Probably not fully what you want, but provides a starting point. Providing your first dataframe is called df and the other one (with one column) df2
#make data long using tidyr
df_long <- tidyr::pivot_longer(df, cols = c("loc1", "loc2", "loc3"))
#cartesian join with codes column
CJ(df_long$value, df2)

Create full data frame from possible combinations of grouping variables

I apologize if this has been asked before, but I could not find the answer I needed when there are three grouping variables.
I need to fill a dataframe with possible combinations of variables, but insert NAs for a non-grouping observation values when a combination does not appear. Say there is a dataframe with three grouping variables: Year, Geography, and Grouping:
Year <- rep(2008:2019,each=50)
Geography <- rep(1:60,each=10)
Grouping <- rep(1:4,each=150)
value <- seq(rnorm(600,mean=0,sd=1))
df=cbind(Year,Geography)
df=as.data.frame(cbind(df,value))
But the dataframe is missing some random observations like so:
df2=df[-c(15,60,150,510),]
How would one go about changing the dataframe back into a length of 600 (which is the length it would be if all possible combinations of three grouping variables were present), but inserting NAs where the value would be if the combinations were in the dataframe? Note that all unique observations for each grouping variable are present in the dataset at some point.

how to divide the value in each cell of a .csv by the value in another cell across multiple rows and variables in R?

I have a .csv file of 39 variables and 713 rows, each containing a count of plastic items. I have another column which is the survey length, and I want to standardise each count of items by a survey length of 100. I am unsure how to create a loop to run through each row and cell individually to do this. Many also have NA values.
Any ideas would be great.
Thank you.
Consider applying formula directly on columns without need of looping:
# RETRIEVE ALL COLUMN NAMES (MINUS SURVEY LENGTH)
vars <- names(df)[!grepl("survey_length", names(df))]
# EXPAND SINGLE COLUMN TO EQUAL DIMENSION OF DATA FRAME
survey_length_mat <- matrix(df$survey_length, ncol=length(vars), nrow=nrow(df))
# APPLY FORMULA
df[vars] <- (df[vars] / survey_length_mat) * 100
df

how to create subsets of data. frame in R?

I have two data frames one with 94 rows and 167 columns (df_1) and the other one with 94 rows and 1 column (df_2) and I would like to do 167 different data frames with each column of the first data frame and the same column of the second data frame, I have tried with a for loop like the next
for (i in seq_len(ncol(df_1))){
df_[[i]] <- data.frame(df_1[sort(rownames(df_1)),i,df_2[sort(rownames(df_2)),])
}
But it does not work, can someone help me?
I think to join two df if they have similar column name use the below code
library(gtools)
df3<- smartbind(df1,df2) ####df1 with 167 column and df2 with 1 column
this will give you a single data frame and to create various data frames use the answer used. in the below thread:
How to loop through the columns in an R data frame and create a new data frame using the column name in each iteration?

When renaming a subset of columns in R dataframe, the first column data is duplicated

I have a data frame of 1530 columns, the first 800 of which I want to subset to have a common name and the last 730 of which I want to subset to have a common name. My attempts to rename the first subset have been colnames(dataframe)[1:800] <- c("NonNorm") and colnames(dataframe)[1:800] <- rep("NonNorm", 800).
However, when I view the altered data frame, all of the values in the rows of the first column have been duplicated across all of the columns through 800. Why is this happening and how do I preserve the original values while changing the column names?

Resources