Using data that is not in common from csv files/Did it - r

Is there a way to only use data from one CSV file that is not the same as this other CSV file? I recently split some data to conduct EFA and CFA analysis. I need to not use the information that will use to conduct the EFA analysis because then it serves no point to randomly split the data.
So how do I only use the data that I did not use in the CFA? If anyone can help please, it would be much appreciated.
Edit:
what I did was the following
Usage <-anti_join(file one, file two, by ='the column in which I could separate by')
then I just exported the file into a CSV, thank you all!

Related

r - Extract data from vcf file

I analyzed DNA sequences in a bioinformatics pipeline to identify genetic variants of my samples. The effects of these variants have been estimated using the software snpEff. It returns a vcf file like this example file.
Since I have a multitude of those vcf files, I'd like to read in the vcf files and extract data from the annotation field (ANN=). The problem I have is that every line after the header contains an ANN field, but the number of annotations can vary from line to line. Thus, I'm looking for a simple way to convert the annotation subfields into a list of data frames (one row for every annotation, columns for the annotation subfields).
I'd be happy if you'd help and suggest a way on how to succeed in extracting the annotation info. Thanks a lot in advance!

Is there a way to read in a sample of an Rda file?

I have a very large dataset in an Rda file that I want to use for a shiny app but since it's so large I'm thinking of just taking a sample of the file and read that in. Is there anyway to do that?

extracting data from matlab file in R

It's the first time I deal with Matlab files in R.
The rationale for saving the information in a .mat file type was the length. (the dataset contains 226518 rows). We were worried to excel (and then a csv) would not take them.
I can upload the original file if necessary
So I have my Matlab file and when I open it in Matlab all good.
There are various arrays and the one I want is called "allPoints"
I can open it and then see that it contains values around 0.something.
Screenshot:
What I want to do is to extract the same data in R.
library(R.matlab)
df <- readMat("170314_Col_HD_R20_339-381um_DNNhalf_PPP1-EN_CellWallThickness.mat")
str(df)
And here I get stuck. How do I pull out "allPoints" from it. $ does not seem to work.
I will have multiple files that need to be put together in one single dataframe in R so the plan is to mutate each extracted df generating a new column for sample and then I will rbind together.
Could anybody help?

Imputation package "mi" output

I am using the "mi" package for imputation of missing values. I have run the following code:
'mi' package code
library(mi)
imp_rd<-mi(rd1) ## rd1 is my data file containing 7 variables.
summary(imp_rd)
hist(imp_rd)
Now, I want to save the output of
"imp_rd" (which is my imputed data file) as .csv file. Any one who will help me regarding this problem.
if you want to export imputed data-sets generated by the model that mi estimated, a good way to do it is by using the mi2stata command, which allows you to export to either a .dta or a .csv format.
But remember not to think about exporting "one" imputed data set. The whole point of multiple imputation is that you can get a bunch of different imputed data sets that will allow you to account for the uncertainty induced by the missing data that you originally had.
So be sure to specify how many imputed data sets you want to export and the path where you want to save the imputed data. In the following example I chose to generate 10 imputed data sets.
library(mi)
imp_rd<-mi(rd1)
mi2stata(imp_rd, m=10, "pathtofile/imp_rd.csv")
Hope you find this useful.
if your output file is a dataframe you can use:
write.csv(imp_rd, file = "imp_rd.csv", sep = ",")
this should save file in csv in your working directory
thanks

reaching max.print on R

I just found a bunch of weather data that I would like to play around with in glmnet in R. First I've been reading and organizing the data in R, and right now I am just trying to look at the raw data of each variable. Unfortunately, each variable has a lot of data and R isn't able to print it all. Is there a way I can view all the raw data in R or just in the file itself? I've tried opening the file in excel to no success. Thanks!
Try to use Frequency tables, you can group by segments.
str() , summary(), table(), pairs(), plots() etc. There are several libraries (such as decr) which facilitate analyzing numerical and factor levels. Let me know if you need help with any.

Resources