How to get the indices of the original data after the data is splitted using torch.utils.data.random_split? - torch

I have a dataset which I wish to split using torch.utils.data.random_split(), but I want to know the indices which are used from the original data used to create the new data.

Related

Data extraction in R - multiple columns

Hello, I have this type of table consisting of a single row and several columns. I have tried a code to extract my KD_PL parameters without success. Do you know a way in R to extract all the KD_PLs and store them in a vector or data frame array?
I tried this:
KDPL <- select("KD_PL.", which(substr(colnames(max_LnData), start=1, stop=6)))
This should do the trick:
library(tidyverse)
KDPL <- max_LnData %>% select(starts_with("KD_PL."))
This function selects all columns from your old dataset starting with "KD_PL." and stores them in a new dataframe KDPL.
If you only want the names of the columns to be saved, you could use the following:
KDPL_names <- colnames(KDPL)
This saves the column names in the vector KDPL_names.

How do I apply the same changes on multiple data frames in R?

I have a file (named subdatlob) containing a list of data frames (4dfs namely 1,2,3, and 4). For each data frame, I want to implement the following
tri_I=as.triangle(subdatlob[["I"]],origin="AY",dev="DY",value="paid")
triLoB_I = incr2cum(tri_I)
for I = 1,2,3,4 or more generally, for each data frame I in the given list.
How do I do this? I will also be doing this step for a list containing 1,000,000+ data frames.
This inquiry involves applying a function to every data frame and naming the necessary variables for the computation.
#shs's suggestion
lapply(subdatlob, \(x) incr2cumc(as.triangle(x,origin="AY",dev="DY",value="paid")))
worked for me and for my larger list containing more data frames.

How to dynamically create and name data frames in a for loop

I am trying to generate data frame subsets for each respondent in a data frame using a for loop.
I have a large data frame with columns titled "StandardCorrect", "NameProper", "StartTime", "EndTime", "AScore", and "StandardScore" and several thousand rows.
I want to make a subset data frame for each person's name so I can generate statistics for each respondent.
I tried using a for loop
for(name in 1:length(NamesList)){ name <- DigiNONA[DigiNONA$NameProper == NamesList[name], ] }
NamesList is just a list containing all the levels of NamesProper (which isa factor variable)
All I want the loop to do is each iteration, generate a new data frame with the name "NamesList[name]" and I want that data frame to contain a subset of the main data frame where NameProper corresponds to the name in the list for that iteration.
This seems like it should be simple I just can;t figure out how to get r to dynamically generate data frames with different names for each iteration.
Any advice would be appreciated, thank you.
The advice to use assign for this purpose is technically feasible, but incorrect in the sense that it is widely deprecated by experienced users of R. Instead what should be done is to create a single list with named elements each of which contains the data from a single individual. That way you don't need to keep a separate data object with the names of the resulting objects for later access.
named_Dlist <- setNames( split( DigiNONA, DigiNONA$NameProper),
NamesList)
This would allow you to access individual dataframes within the named_Dlist object:
named_Dlist[[ NamesList[1] ]] # The dataframe with the first person in that NamesList vector.
It's probably better to use the term list only for true R lists and not for atomic character vectors.

Row names showing up as row numbers in R

This probably has a simple fix, but I'm relatively new to using R and could use some assistance.
The toy data I'm using for a gene network analysis has rows that look like this:
whereas the data that I've uploaded has rows that look like this:
.
The code I'm using refers to the row names to map on as gene names. I am able to successfully run this analysis, however, the output I end up with has lists of row numbers where there should be lists of gene names.
Is there a simple way that I can convert my data into the toy data format so that the row names are gene names instead of numbers?

How do I merge 2 data frames on R based on 2 columns?

I am looking to merge 2 data frames based on 2 columns in R. The two data frames are called popr and dropped column, and they share the same 2 variables: USUBJID and TRTAG2N, which are the variables that I want to combine the 2 data frames by.
The merge function works when I am only trying to do it based off of one column:
merged <- merge(popr,droppedcol,by="USUBJID")
When I attempt to merge by using 2 columns and view the data frame "Duration", the table is empty and there are no values, only column headers. It says "no data available in table".
I am tasked with replicating the SAS code for this in R:
data duration;
set pop combined1 ;
by usubjid trtag2n;
run;
On R, I have tried the following
duration<- merge(popr,droppedcol,by.x="USUBJID","TRTAG2N",by.y="USUBJID","TRTAG2N")
duration <- merge(popr,droppedcol,by.x="USUBJID","TRTAG2N",by.y="USUBJID","TRTAG2N")
duration <- full_join(popr,droppedcol,by = c("USUBJID","TRTAG2N"))
duration <- merge(popr,droppedcol,by = c("USUBJID","TRTAG2N"))
I would like to see a data frame with the columns USUBJID, TRTAG2N, TRTAG2, and FUDURAG2, sorted by first FUDURAG2 and then USUBJID.
Per the SAS documentation, Combining SAS Data Sets, and confirmed by the SAS guru, #Tom, in comments above, the set with by simply means you are interleaving the datasets. No merge (which by the way is also a SAS method which you do not use) is taking place:
Interleaving uses a SET statement and a BY statement to combine
multiple data sets into one new data set. The number of observations
in the new data set is the sum of the number of observations from the
original data sets. However, the observations in the new data set are
arranged by the values of the BY variable or variables and, within
each BY group, by the order of the data sets in which they occur. You
can interleave data sets either by using a BY variable or by using an
index.
Therefore, the best translation of set without by in R is rbind(), and set with by is rbind + order (on the rows):
duration <- rbind(pop, combined1) # STACK DFs
duration <- with(duration, duration[order(usubjid, trtag2n),]) # ORDER ROWS
However, do note: rbind does not allow unmatched columns between the concatenated data sets. However, third-party packages allow for unmatched columns including: plyr::rbind.fill, dplyr::bind_rows, data.table::rbindlist.

Resources