R version of splitting dataset using macro in SAS - r

I am new to using R and have previously been using SAS for all my work. I'm struggling to convert some SAS logic into R and would really appreciate some help with this. In SAS, I can use macros to split up a dataset and keep specific variables and rename the resulting datasets according to the variables. For example:
%macro data_split (field);
data data_out_&field. (keep = policy_number date &field.);
set my_data;
run;
%mend;
%data_split(area1);
%data_split(surname);
%data_split(dob);
/*
The code will produce three datasets:
data_out_area1
data_out_surname
data_out_dob
All three datasets will have the variables 'policy_number' and 'date' in them as suggested by the keep statement.
Plus, each dataset will have ONE additional variable 'area1', 'surname' and 'dob' respectively.
The output datasets have been suffixed with the variable name used in the macro "data_split".
*/
In R, I can do the following:
data_out_area1 <- mydata$area1
data_out_surname <- mydata$surname
data_out_dob <- mydata$dob
However, I lose the column names when doing this. Also, and perhaps more importantly, if I have a hundred variables I want to avoid writing out these statements a hundred times... is there a way for me to loop through the data frame and create a hundred new datasets?

Related

I want to be able to change or reshape this list to a dataframe or table to analyse, any help? see code below. I use R

nflight = GET('http://api.aviationstack.com/v1/flights?access_key=709b8cba703074de66ca50f1c5c69ce6')
rawToChar(nflight$content)
flight_data = fromJSON(rawToChar(nflight$content))
Welcome KMazeltov, a small point to start: it can be helpful to check the formatting of your question as currently your code has whitespace and needs to be separated with new lines.
I imagine you have already inspected your data, "flight_data", using str(flight_data), dim(flight_data), and View(flight_data), but if you haven't this can be a helpful place to start.
You will see that within your data there are multiple data frames already present e.g. flight_data[["data"]] is a data.frame with 100 rows and 8 columns, then flight_data[["data"]][["departure"]] is a data.frame with 100 rows and 12 columns.
So it is not yet clear which variables you want to work with or in what way but here are some recommendations:
You can save information to variables and then construct your own data frame as follows:
my_first_column <- flight_data[["data"]][["departure"]][["airport"]]
my_second_column <- flight_data[["data"]][["departure"]][["scheduled"]]
my_dataframe <- cbind(my_first_column, my_second_column)
dim(my_dataframe)
head(my_dataframe)
You can call the table() function from R on any of your own data also:
table(my_dataframe) or on your original data table(flight_data$data$flight_status)

Writing For Loop or Split function to separate data from Master data frame into smaller data frames

I am once again asking for your help and guidance! Super duper novice here so I apologize in advance for not explaining things properly or my general lack of knowledge for something that feels like it should be easy to do.
I have sets of compounds in one "master" list that need to be separated into smaller list. I want to be able to do this with a "for loop" or some iterative function so I am not changing the numbers for each list. I want to separate the compounds based off of the column "Run.Number" (there are 21 Run.Numbers)
Step 1: Load the programs needed and open File containing "Master List"
# tMSMS List separation
#Load library packages
library(ggplot2)
library(reshape)
library(readr) #loading the csv's
library(dplyr) #data manipulation
library(magrittr) #forward pipe
library(openxlsx) #open excel sheets
library(Rcpp) #got this from an error code while trying to open excel sheets
#STEP 1: open file
S1_MasterList<- read.xlsx("/Users/owner/Documents/Research/Yurok/Bioassay/Bioassay Data/220410_tMSMS_neg_R.xlsx")
Step 2: Currently, to go through each list, I have to change the "i" value for each iteration. And I also must change the name manually (Ctrl+F), by replacing "S2_Export_1" with "S2_Export_2" and so on as I move from list to list. Also, when making the smaller list, there are a handful of columns containing data that need to be removed from the “Master List”. The specific format of column names are so it will be compatible with LC-MS software. This list is saved as a .csv file, again for compatibility with LC-MS software
#STEP 2: Iterative
#Replace: S2_Export_1
i=1
(S2_Separate<- S1_MasterList[which(S1_MasterList$Run.Number == i), ])
%>%
(S2_Export_1<-data.frame(S2_Separate$On,
S2_Separate$`Prec..m/z`,
S2_Separate$Z,
S2_Separate$`Ret..Time.(min)`,
S2_Separate$`Delta.Ret..Time.(min)`,
S2_Separate$Iso..Width,
S2_Separate$Collision.Energy))
(colnames(S2_Export_1)<-c("On", "Prec. m/z", "Z","Ret. Time (min)", "Delta Ret. Time (min)", "Iso. Width", "Collision Energy"))
(write.csv(S2_Export_1, "/Users/owner/Documents/Research/Yurok/Bioassay/Bioassay Data/Runs/220425_neg_S2_Export_1.csv", row.names = FALSE))
Results: The output should look like this image provided below, and for this one particular data frame called "Master List", there should be 21 smaller data frames. I also want the data frames to be named S2_Export_1, S2_Export_2, S2_Export_3, S2_Export_4, etc.
First, select only required columns (consider processing/renaming non-syntactic names first to avoid extra work downstream):
s1_sub <- select(S1_MasterList, Sample.Number, On, `Prec..m/z`, Z,
`Ret..Time.(min)`, `Delta.Ret..Time.(min)`,
Iso..Width, Collision.Energy)
Then split s1_sub into a list of dataframes with split()
s1_split <- split(s1_sub, s1_sub$Sample.Number)
Finally, name the resulting list of dataframes with setNames():
s1_split <- setNames(s1_split, paste0("S2_export_", seq_along(s1_split))

Merging several large datasets - Memory issue

I have about 15 different Data sets in R that I need to merge into 1 big Data set.
Combining them will create a data set of about 1120 variables and about 1500 observations.
There is no problem merging the first 5 data sets (getting to about 700 variables), but when trying to merge the 6th/7th dataset R either get stuck or have an error msg of:
Error: cannot allocate vector of size 10.7 Mb
I have tried different ways to write this code (functions/loops), but this is the simplest way, by which I understood that it gets stuck on the 6th dataset:
#Merging the first two data sets
#bindedDataNames is a chr vector with the names of all the datasets that need
#to be merged.
Age11_twins_22022017 <- merge(eval(parse(text = bindedDataNames[1]))
[,-c(1:2)],
eval(parse(text = bindedDataNames[2]))
[,-c(1:3)],
by=c("ifam","ID"))
#Loop to merge all datasets. With print I saw it goes without a problem until
#the 6th dataset
for (cnt2 in 3:17) {
print(cnt2)
Age11_twins_22022017 <- merge(Age11_twins_22022017,
eval(parse(text = bindedDataNames[cnt2]))
[,-c(1:3)],
by=c("ifam","ID"))
}
I saw that there are packages for big data such as bigmemory or ff, but couldn't really figure out how to write the merge result (which is different from step to step) into this big matrix.
Is it even possible in R to merge several datasets into a really big one?
I would want to both be able to export this file to later use in SPSS and be able to do statistical analysis in R itself.

replace NA with nothing in R changes variable types to categorical

Is there anyway without converting to char to replace NA with blank or nothing?
I used
data_model <- sapply(data_model, as.character)
data_model[is.na(data_model)] <- " "
data_model=data.table(data_model)
however it changes all the columns' types to categorical.
I want to save the data set and use it in sas it does not understand NA.
Here's a somewhat belated (and shameless self-promotion) from The R Primer on how to export a data frame to SAS. It should automatically correctly handle your NAs:
First you can use the foreign package to export the data frame as a SAS xport dataset. Here, I'll just export the trees data frame.
library(foreign)
data(trees)
write.foreign(trees, datafile = "toSAS.dat",
codefile="toSAS.sas", package="SAS")
This gives you two files, toSAS.dat and toSAS.sas. It is easy to get the data into SAS since the codefile toSAS.sas contains a SAS script that can be read and interpreted directly by SAS and reads the data in toSAS.dat.

Using R to create and merge zoo object time series from csv files

I have a large set of csv files in a single directory. These files contain two columns, Date and Price. The filename of filename.csv contains the unique identifier of the data series. I understand that missing values for merged data series can be handled when these times series data are zoo objects. I also understand that, in using the na.locf(merge() function, I can fill in the missing values with the most recent observations.
I want to automate the process of.
loading the *.csv file columnar Date and Price data into R dataframes.
establishing each distinct time series within the Merged zoo "portfolio of time series" objects with an identity that is equal to each of their s.
merging these zoo objects time series using MergedData <- na.locf(merge( )).
The ultimate goal, of course, is to use the fPortfolio package.
I've used the following statement to create a data frame of Date,Price pairs. The problem with this approach is that I lose the <filename> identifier of the time series data from the files.
result <- lapply(files, function(x) x <- read.csv(x) )
I understand that I can write code to generate the R statements required to do all these steps instance by instance. I'm wondering if there is some approach that wouldn't require me to do that. It's hard for me to believe that others haven't wanted to perform this same task.
Try this:
z <- read.zoo(files, header = TRUE, sep = ",")
z <- na.locf(z)
I have assumed a header line and lines like 2000-01-31,23.40 . Use whatever read.zoo arguments are necessary to accommodate whatever format you have.
You can have better formatting using sapply( keep the files names). Here I will keep lapply.
Assuming that all your files are in the same directory you can use list.files.
it is very handy for such workflow.
I would use read.zoo to get directly zoo objects(avoid later coercing)
For example:
zoo.objs <- lapply(list.files(path=MY_FILES_DIRECTORY,
pattern='^zoo_*.csv', ## I look for csv files,
## which names start with zoo_
full.names=T), ## to get full names path+filename
read.zoo)
I use now list.files again to rename my result
names(zoo.objs) <- list.files(path=MY_FILES_DIRECTORY,
pattern='^zoo_*.csv')

Resources