Graphing Values from multiple H5/HDF5 files at once - r

I've first figured out how to read and name multiple H5 files from my directory, but I'm running into actually being able to graph with them. My problem is multiple - with this type of file, I do not know how to make the columns have the same number of rows and I do not know how to call on specific files.
My initial setup is as followed
library("rhdf5")
library("ggplot2")
library("fs")
library("tidyverse")
wd <- "D:/Data/1282-1329/"
setwd(wd)
testh5 <- H5Fopen("1282.h5")
H5Fclose(testh5)
y <- h5read(file = "1282.h5",
name = "/Signal")
x <- h5read(file = "1282.h5",
name = "/Scan")
The / refers to the H5 files 'Group' and the Signal or Scan refers to the 'Name', thus "/Signal" creates a numerical list with a length of 48 (number of files within 1282-1329). I make multiple lists from each of these by doing
file_paths <- fs::dir_ls("D:/Data/1282-1329/H5")
file_paths
file_Scan <- list()
for (i in seq_along(file_paths)) {
file_Scan[[i]] <- h5read(
file = file_paths[[i]],
name = "/Scan"
)
}
file_Signal <- list()
for (i in seq_along(file_paths)) {
file_Signal[[i]] <- h5read(
file = file_paths[[i]],
name = "/Signal"
)
}
file_Scan <- setNames(file_Scan, file_paths)
file_Signal <- setNames(file_Signal, file_paths)
Thus str(file_Signal) gives me something like..
List of 48
$ D:/Data/1282-1329/H5/1282.h5: num [1:8044(1d)] 11569527 11576106 10848312 11007212 11074822 ...
$ D:/Data/1282-1329/H5/1283.h5: num [1:8045(1d)] 9746633 9886735 10000637 9617273 ...
So my first problem here is [1:8044(1d)] and [1:8045(1d)] - they're one row off. But I'm unable to add in NAs or make the lengths the same as I would a normal list. Is it because I'm thinking about this wrong? I feel like the solution is simple.
My ultimate goal will be to create multiple single plots for each of these files in the directory using something like
for (i in seq_along(file_paths)) {
plots[[i]] = ggplot(file_paths, aes(x=file_Signal, y=file_Scan))+
geom_point(size=1)
}
Then use these to create a rolling gif of the files with Even numbers (1282, 1284, 1286, etc) and Odd numbers (1283, 1285, 1287, etc.)
Thank you for any help or resources to might have to offer.

Related

Creating a for loop in R from a list

I'm trying to create a for loop in R to iterate through a list of genetic variants, labeled with rsID's, and filter the results by patient ID.
ace2_snps <- c("rs4646121", "rs4646127", "rs1996225", "rs2158082", "rs4830974", "rs148271868", "rs113539251", "rs4646135", "rs4646179", "rs2301693", "rs16980031", "rs12689012", "rs4646141", "rs142049267", "rs16979971", "rs12007623", "rs4646182", "rs147214574", "rs6632677", "rs139469582", "rs149000434", "rs148805807", "rs112032651", "rs144314464", "rs147077778", "rs182259051", "rs112621533", "rs35803318", "rs35304868", "rs113848176", "rs145345877", "rs12009805", "rs233570", "rs73635824", "rs73635823", "rs4646142", "rs4646157", "rs2074192", "rs79878075", "rs144239059", "rs67635467", "rs183583165", "rs137910448", "rs116419580", "rs2097723", "rs4646170")
for (snps in ace2_snps) {
genotype_snps <- as.data.frame(bgen_ACE2$data[snps,,])
idfromcsv <- read.csv("/Users/keeseyyyyy/Desktop/Walley/pospatid.csv")
id <- as.character(idfromcsv[[1]])
filtered_snps <- genotype_snps[id,] }
I need to run genotype_rs146217251 <- as.data.frame(bgen_ACE2$data["rs146217251",,]) for each rsID, and then I'd like to label the variable filtered_snps according to its rsID in the place of "snps" in the variable name for each variant.
I'm not very familiar with R syntax. Can anyone give me some tips?
For one variant, the process would go like this:
genotype_rs146217251 <- as.data.frame(bgen_ACE2$data["rs146217251",,])
idfromcsv <- read.csv("/Users/keeseyyyyy/Desktop/Walley/pospatid.csv")
id <- as.character(idfromcsv[[1]])
filtered <- genotype_rs146217251[id,]

how to write out multiple files in R?

I am a newbie R user. Now, I have a question related to write out multiple files with different names. Lets says that my data has the following structure:
IV_HAR_m1<-matrix(rnorm(1:100), ncol=30, nrow = 2000)
DV_HAR_m1<-matrix(rnorm(1:100), ncol=10, nrow = 2000)
I am trying to estimate multiple LASSO regressions. At the beginning, I was storing the iterations in one object called Dinamic_beta. This object was stored in only one file, and it saves the required information each time that my code iterate.
For doing this I was using stew which belongs to pomp package, but the total process takes 5 or 6 days and I am worried about a power outage or a fail in my computer.
Now, I want to save each environment (iterations) in a .Rnd file. I do not know how can I do that? but the code that I am using is the following:
library(glmnet)
library(Matrix)
library(pomp)
space <- 7 #THE NUMBER OF FILES THAT I would WANT TO CREATE
Dinamic_betas<-array(NA, c(10, 31, (nrow(IV_HAR_m1)-space)))
dimnames(Dinamic_betas) <- list(NULL, NULL)
set.seed(12345)
stew( #stew save the enviroment in a .Rnd file
file = "Dinamic_LASSO_RD",{ # The name required by stew for creating one file with all information
for (i in 1:dim(Dinamic_betas)[3]) {
tryCatch( #print messsages
expr = {
cv_dinamic <- cv.glmnet(IV_HAR_m1[i:(space+i-1),],
DV_HAR_m1[i:(space+i-1),], alpha = 1, family = "mgaussian", thresh=1e-08, maxit=10^9)
LASSO_estimation_dinamic<- glmnet(IV_HAR_m1[i:(space+i-1),], DV_HAR_m1[i:(space+i-1),],
alpha = 1, lambda = cv_dinamic$lambda.min, family = "mgaussian")
coefs <- as.matrix(do.call(cbind, coef(LASSO_estimation_dinamic)))
Dinamic_betas[,,i] <- t(coefs)
},
error = function(e){
message("Caught an error!")
print(e)
},
warning = function(w){
message("Caught an warning!")
print(w)
},
finally = {
message("All done, quitting.")
}
)
if (i%%400==0) {print(i)}
}
}
)
If someone can suggest another package that stores the outputs in different files I will grateful.
Try adding this just before the close of your loop
save.image(paste0("Results_iteration_",i,".RData"))
This should save your entire workspace to disk for every iteration. You can then use load() to load the workspace of every environment. Let me know if this works.

Creating a 3-dimensional array of excel files in R

I have the following MWE, although the datasets are unavailable:
N <- 84 #Number of datasets to pull data from
dates <- c("2010.01", "2010.02", "2010.03", "2010.04", "2010.05", "2010.06", "2010.07", "2010.08",
"2010.09", "2010.10", "2010.11", "2010.12", "2011.01", "2011.02", "2011.03", "2011.04", "2011.05",
"2011.06", "2011.07", "2011.08", "2011.09", "2011.10", "2011.11", "2011.12", "2012.01", "2012.02",
"2012.03", "2012.04", "2012.05", "2012.06", "2012.07", "2012.08", "2012.09", "2012.10", "2012.11",
"2012.12", "2013.01", "2013.02", "2013.03", "2013.04", "2013.05", "2013.06", "2013.07", "2013.08",
"2013.09", "2013.10", "2013.11", "2013.12", "2014.01", "2014.02", "2014.03", "2014.04", "2014.05",
"2014.06", "2014.07", "2014.08", "2014.09", "2014.10", "2014.11", "2014.12", "2015.01", "2015.02",
"2015.03", "2015.04", "2015.05", "2015.06", "2015.07", "2015.08", "2015.09", "2015.10", "2015.11",
"2015.12", "2016.01", "2016.02", "2016.03", "2016.04", "2016.05", "2016.06", "2016.07", "2016.08",
"2016.09", "2016.10", "2016.11", "2016.12") #list of all dates to loop through
A <- list() #empty list to store excel files
for (k in seq_along(dates)) {
A[k] <- read_excel(paste0("~/R/data.", dates[k], ".xlsx"), range = "B3:EO94")
}
Total <- array(unlist(A), dim=c(91,144,84))
First <- read_excel(paste0("~R/data.", dates[1], ".xlsx"), range="B3:EO94")
This gives me Total the 3-dimensional array and then First which should be the first "slice" of the array. So if I take some arbitrary coordinate, say 15,34, then I should be able to pull the exact some value from both Total and First so I try the following:
> Total[15,34,1]
[1] 0.000392432
> A1[15,34]
# A tibble: 1 x 1
`-97.5`
<dbl>
1 0.000384
The 0.000384 is the proper number found in the excel file from AI18 and the number given from Total is incorrect. What gives? To further double-check I compared Total[15,34,2] with the second "slice" and alas, the same incorrect result from Total.
Try using the double square brackets, A[[k]] to assign the data from the Excel files.
A <- list() #empty list to store excel files
for (k in seq_along(dates)) {
A[[k]] <- read_excel(paste0("~/R/data.", dates[k], ".xlsx"), range = "B3:EO94")
}

How to create a loop that changes part of a column name in a data frame

I am trying to find Cronbach's Alpha for survey data containing a series of multi-item measures. Rather than have to manually write out every single multi-item measure, it looks like something a loop should be able to manage far more effectively, but it needs to change only part of the column name, according to the question number.
The basic idea as it currently sits in my head would be...
for (N in 4:22) {
ytqN <- data.frame(YT_Data$QNa, YT_Data$QNb, YT_Data$QNc)
alpha(ytqN)
}
The loop would then create new data frames for each multi item measure and run Cronbach's Alpha as it goes.
This doesn't work though. :(
ytq4 <- data.frame(YT_Data$Q4a, YT_Data$Q4b, YT_Data$Q4c)
alpha(ytq4)
ytq5 <- data.frame(YT_Data$Q5a, YT_Data$Q5b, YT_Data$Q5c)
alpha(ytq5)
ytq6 <- data.frame(YT_Data$Q6a, YT_Data$Q6b, YT_Data$Q6c)
alpha(ytq6)
ytq7 <- data.frame(YT_Data$Q7a, YT_Data$Q7b, YT_Data$Q7c)
alpha(ytq7)
ytq8 <- data.frame(YT_Data$Q8a, YT_Data$Q8b, YT_Data$Q8c)
alpha(ytq8)
ytq9 <- data.frame(YT_Data$Q9a, YT_Data$Q9b, YT_Data$Q9c)
alpha(ytq9)
ytq10 <- data.frame(YT_Data$Q10a, YT_Data$Q10b, YT_Data$Q10c)
alpha(ytq10)
ytq11 <- data.frame(YT_Data$Q11a, YT_Data$Q11b, YT_Data$Q11c)
alpha(ytq11)
ytq12 <- data.frame(YT_Data$Q12a, YT_Data$Q12b, YT_Data$Q12c)
alpha(ytq12)
ytq13 <- data.frame(YT_Data$Q13a, YT_Data$Q13b, YT_Data$Q13c)
alpha(ytq13)
ytq14 <- data.frame(YT_Data$Q14a, YT_Data$Q14b, YT_Data$Q14c)
alpha(ytq14)
ytq15 <- data.frame(YT_Data$Q15a, YT_Data$Q15b, YT_Data$Q15c)
alpha(ytq15)
ytq16 <- data.frame(YT_Data$Q16a, YT_Data$Q16b, YT_Data$Q16c)
alpha(ytq16)
ytq17 <- data.frame(YT_Data$Q17a, YT_Data$Q17b, YT_Data$Q17c)
alpha(ytq17)
ytq18 <- data.frame(YT_Data$Q18a, YT_Data$Q18b, YT_Data$Q18c)
alpha(ytq18)
ytq19 <- data.frame(8 - YT_Data$Q19a, YT_Data$Q19b, YT_Data$Q19c)
# Reverse code Q19a
alpha(ytq19)
ytq20 <- data.frame(YT_Data$Q20a, YT_Data$Q20b, YT_Data$Q20c)
alpha(ytq20)
ytq21 <- data.frame(YT_Data$Q21a, YT_Data$Q21b, YT_Data$Q21c)
alpha(ytq21)
ytq22 <- data.frame(YT_Data$Q22a, YT_Data$Q22b, YT_Data$Q22c)
alpha(ytq22)
The desired results would be a single output containing all the Cronbach's Alphas for the multi item measures for questions 4-22 in the data set I am currently working on executed via a single piece of code, rather than have to go question by question.
It's easier to help if you include your data, but I guess this should work:
alpha_list = list()
for(N in 4:22){
ytq = data.frame(YT_Data[paste0("Q",N,"a")],
YT_Data[paste0("Q",N,"b")],
YT_Data[paste0("Q",N,"c")])
alpha_list[[N]] = alpha(ytq)
}
We are using paste0() to create the column names while looping on N. alpha_list will be a list with the results given by alpha()

For Loop in R, all in 1 command

I created this random time series:
MM=1584
Z0<-rnorm(MM,8,1.0)#;ts.plot(Z0)
s_1=1.50; p_1=121; p_2=240
s_2=1.25; p_3=361; p_4=480
s_3=1.10; p_5=601; p_6=720
s_4=1.50; p_7=960; p_8=1020
s_5=1.25; p_9=1140; p_10=1320
s_6=1.50; p_11=1369; p_12=1440
a=(Z0[1:p_1-1])
b=(s_1+Z0[p_1:p_2])
c=(Z0[(p_2+1):(p_3-1)])
d=(s_2+Z0[p_3:p_4])
e=(Z0[(p_4+1):(p_5-1)])
f=(s_2+Z0[p_5:p_6])
g=(Z0[(p_6+1):(p_7-1)])
h=(s_3+Z0[p_7:p_8])
i=(Z0[(p_8+1):(p_9-1)])
l=(s_4+Z0[p_9:p_10])
m=(Z0[(p_10+1):(p_11-1)])
n=(s_5+Z0[p_11:p_12])
o=Z0[(p_12+1):MM]
Z=c(a,b,c,d,e,f,g,h,i,l,m,n,o);ts.plot(Z)
abline(v=p_1,col="red");abline(v=p_2,col="red");abline(v=p_3,col="red")
abline(v=p_4,col="red");abline(v=p_5,col="red");abline(v=p_6,col="red")
abline(v=p_7,col="red");abline(v=p_8,col="red");abline(v=p_9,col="red")
abline(v=p_10,col="red");abline(v=p_11,col="red");abline(v=p_12,col="red")
Zm=as.data.frame(Z)
write.csv2(Zm, file="C:/Users/Luca/Dekstop/Zm/Zm1.csv")
I would like to repeat these commands to create 100 series and to save these with write.cs2(...Zm"...".csv).
I don't want to change the file names and repeat the commands all manually.
I searched something useful in other questions but I didn't find it.
The loop has to change only the name of data frame (Zm) and the file names, for each loop.
I'm looking to repeat 100 times the creation of Z0 (Z01, Z02, Z03 ... Z0100) , then Z (Z1, Z2, ... Z100) so Zm (Zm1, Zm2, Zm3... Zm100) and save them in the folder with new file names (folder/Zm1, Zm2, Zm3 etc...) all in 1 command with a loop.
I'm not sure why you want to change the name of the data frames, but dynamically changing the name of the file is straightforward.
for (i in 1:100) { ... write.csv2(Zm, file=paste("C:/Users/Luca/Dekstop/Zm/Zm", i, ".csv", sep = "")) }
If you want to keep the created data frames, why not just simply use a list?

Resources