Set up and running function for multiple observations and variables - r

I have a question about the setup and execution of a function to some multivariate data.
My data file is set up in excel with each variable as individual sheets, and each trajectory as a row of data (100 trajectories in total). The values within each row across 365 columns show the measurements associated with the respective variable across time (daily measurements over 1 year).
I’ve done some analysis of 1 trajectory by setting up my data manually in a separate excel file, where I’ve got 16 columns containing separate variables, and 365 rows containing the associated data from each daily measurement. I’ve imported this into R as ‘Traj1’ and set up the function as follows;
> T1 <- Traj1[,1:16]
> multi.fun <- function(T1) {c(summary(T1),sd(T1), skewness(T1), kurtosis(T1), shapiro.test(T1))}
However, I need to do this with 100 trajectories, and this is extremely inefficient (both in R and Excel time).
I’m not sure how best to set this up in R with my initial excel file set up, and how this function should be set up so that I can batch execute and export the output into a new excel file.
Sorry I am new to programming in general and haven’t had much experience in dealing with large data sets. Any help is really appreciated.

Related

stratification using caret package - how to deal with dropout

I am using the caret package for stratified randomization in a clinical trial. I have 4 strata and I used StrBCD.ui (path, folder = "myfolder") to create it. I have two questions. 1. Every time I rerun the function and add a new subject data, it should take into account the data of previously inserted patients, right? Where are these data stored? I cannot find the specific file in the created folder 2. I might have some dropouts after the participants have been assigned to a treatment and I want to exclude them from the randomization. How do I remove their data from the randomization? However, I do not want to restart the randomization from scratch as I want to keep the participants that have already been allocated, how do I do it? In other words, is there a way to have somehow control over the function and the data considered for the randomization?
Thank you
Elisabetta
I tried to run the program several times and look for the data in the created folders

How to map unique ID to each lane items with multiple conditions

I have two files with huge data set, below is the sample data mentioned,
Trying to map Unique BIN number from 2nd file to 1st one(output as below)
I am able to create this in excel by using the countif function with multiple conditions but unable to do it in R. pls help to create code for the same.
Can you explain how to build this logic using R

Is there an R function to find duplicated IDs from a large dataset?

I have an SAS table imported to R Studio, there are around 2500 observations (patient ID). This is a table with data from different experiments (3 experiments or readings), and for some observations/patients, more than one experiment was conducted. I want to know which of those observations have more than one experiment in order to filter the data and just choose an experiment. Also, this will help me see the exact number of observations.

How to export a list of dataframes in R?

I have a list that consists of a large number of data frames, and every time R crashes I lose the variable and have to recreate it. The problem is that my list of data frames is pretty large and takes several hours to recreate.
Is there anyway to save/export this list so I can just load it into R at my convenience (without having to recreate it)?

How to Handle Creating Large Data Sets in R

There's a fair amount of support, through things like the various Revolution R modules, in what to do if you're bringing a large dataset into R, and it's too large to be stored in RAM. But is there any way to deal with data sets being created within R that are too big to store in RAM, beyond simply (and by hand) breaking the creation step into a series of RAM-sized chunks, writing that chunk to disk, clearing it, and continuing on?
For example, just doing a large simulation, or using something like SurvSplit() to take a single observation with a survival time from 1 to N and break it into N seperate observations?
If you're creating the data in R and you can do your analysis on a small chunk of the total data, then only create as large of a chunk as you need for any given analysis.

Resources