incomplete list of csv file imported in R - r

I need to import a list of 36 csv files, but after running the code I get only 26 of them. Probably, 10 files have format problems. Is there a way in R to detect the 10 files that cannot be imported?

If you the file names in a list, you can use the following code:
all <- c("16048.txt", "16062.txt", "16066.txt", "16093.txt", "16095.txt", "16122.txt", "16241.txt", "16360.txt", "16380.txt", "16389.txt", "16510.txt", "16511.txt", "16701.txt", "16729.txt", "16735.txt", "16737.txt", "16761.txt", "16816.txt", "16867.txt", "16876.txt", "16880.txt", "16883.txt", "16884.txt", "16885.txt", "16893.txt", "16904.txt", "16906.txt", "16908.txt", "16929.txt", "16931.txt", "16938.txt", "16943.txt", "16959.txt", "16967.txt", "16968.txt", "16969.txt")
imp <- c("16761.txt", "16959.txt", "16884.txt", "16093.txt", "16883.txt", "16122.txt", "16906.txt", "16737.txt", "16968.txt", "16095.txt", "16062.txt", "16816.txt", "16360.txt", "16893.txt", "16885.txt", "16938.txt", "16048.txt", "16931.txt", "16876.txt", "16511.txt", "16969.txt", "16241.txt", "16967.txt", "16701.txt", "16380.txt", "16510.txt")
Where all is the list of filenames you need and imp is the imperfect result you got. You can get a list of the missing files with:
missing <- all[!all %in% imp]

Related

Graphing Values from multiple H5/HDF5 files at once

I've first figured out how to read and name multiple H5 files from my directory, but I'm running into actually being able to graph with them. My problem is multiple - with this type of file, I do not know how to make the columns have the same number of rows and I do not know how to call on specific files.
My initial setup is as followed
library("rhdf5")
library("ggplot2")
library("fs")
library("tidyverse")
wd <- "D:/Data/1282-1329/"
setwd(wd)
testh5 <- H5Fopen("1282.h5")
H5Fclose(testh5)
y <- h5read(file = "1282.h5",
name = "/Signal")
x <- h5read(file = "1282.h5",
name = "/Scan")
The / refers to the H5 files 'Group' and the Signal or Scan refers to the 'Name', thus "/Signal" creates a numerical list with a length of 48 (number of files within 1282-1329). I make multiple lists from each of these by doing
file_paths <- fs::dir_ls("D:/Data/1282-1329/H5")
file_paths
file_Scan <- list()
for (i in seq_along(file_paths)) {
file_Scan[[i]] <- h5read(
file = file_paths[[i]],
name = "/Scan"
)
}
file_Signal <- list()
for (i in seq_along(file_paths)) {
file_Signal[[i]] <- h5read(
file = file_paths[[i]],
name = "/Signal"
)
}
file_Scan <- setNames(file_Scan, file_paths)
file_Signal <- setNames(file_Signal, file_paths)
Thus str(file_Signal) gives me something like..
List of 48
$ D:/Data/1282-1329/H5/1282.h5: num [1:8044(1d)] 11569527 11576106 10848312 11007212 11074822 ...
$ D:/Data/1282-1329/H5/1283.h5: num [1:8045(1d)] 9746633 9886735 10000637 9617273 ...
So my first problem here is [1:8044(1d)] and [1:8045(1d)] - they're one row off. But I'm unable to add in NAs or make the lengths the same as I would a normal list. Is it because I'm thinking about this wrong? I feel like the solution is simple.
My ultimate goal will be to create multiple single plots for each of these files in the directory using something like
for (i in seq_along(file_paths)) {
plots[[i]] = ggplot(file_paths, aes(x=file_Signal, y=file_Scan))+
geom_point(size=1)
}
Then use these to create a rolling gif of the files with Even numbers (1282, 1284, 1286, etc) and Odd numbers (1283, 1285, 1287, etc.)
Thank you for any help or resources to might have to offer.

R studio write to CSV without number-column

i wand import my data to CSV but without number Column or with numeric it
. My code Very sorry for my bad english:
tab<-read.csv2("CENY.csv")
#View(tab)
library(reshape2)
tab.m<-melt(tab,id.vars="Nazwa")
View(tab.m)
dim(tab.m)
tab.m<-tab.m[5:436,]
mies<-rep(c("sty","lut","mar","kwi","maj","cze","lip","sie","wrz","paź","lis","gru"),each=36)
produkt<-rep(rep(c("cytryny","marchew","cebula"),each=12),12)
rok<-rep(rep(c(2017,2018,2019),each=4),36)
length(mies)
length(produkt)
length(rok)
dane.m<-data.frame(tab.m$Nazwa,tab.m$value,mies=mies,produkt=produkt,rok=rok)
#View(dane.m)
X<-split(dane.m, dane.m$produkt)
str(X)
dane <- X$cebula[,-4]
colnames(dane)<-c("region","cebula","mies","rok")
dane$cytryny<-X$cytryny$tab.m.value
dane$marchew<-X$marchew$tab.m.value
#View(dane)
write.csv(dane, "dane-ceny.csv")
and i get:
"","region","cebula","mies","rok","cytryny","marchew"
"25","POLSKA",1.78,"sty",2017,6.56,1.64
"26","MAZOWIECKIE",1.59,"sty",2017,6.9,1.57
"27","OPOLSKIE",1.77,"sty",2017,7.05,1.85
"28","PODKARPACKIE",1.39,"sty",2017,5.83,1.4

How do I import a file into r with extension .DUSMCPUB?

I’m trying to import the Mortality Multiple Cause Files from the National Center for Health Statistics, located at this link:
https://www.cdc.gov/nchs/data_access/vitalstatsonline.htm#Downloadable
link to image of where to find file on NCHS website
The files have an extension .DUSMCPUB (e.g., the file for 2020 is called "VS20MORT.DUSMCPUB_r20220105”). How do I import such a file? I’m not familiar with the extension.
I have tried to import with the following code, but it causes my R program to terminate. Can you please provide me with a suggestion on how to import these types of files?
VS20MORT <- read_delim("VS20MORT.DUSMCPUB_r20220105")
Thanks #Mel G for sharing this approach. When I tried to run it, I realized that the mortality file includes a few new variables as of 2020 (namely decedent’s occupation and industry). Here’s a slight variation that includes the new variables.
# Install and load necessary packages
# install.packages("sqldf") # Used to read in DUSMCPUB file
# install.packages("dplyr") # Used for tidy data management
library(sqldf)
library(dplyr)
#Increase memory limit to make space for large file
# memory.limit()
memory.limit(size=20000)
# Create dataframe containing variables for column width, name, and end position
columns <- data.frame(widths=c(19,1,40,2,1,1,2,2,1,4,1,2,2,2,2,1,1,1,16,4,1,1,1,
1,34,1,1,4,3,1,3,3,2,1,2,7,7,7,7,7,7,7,7,7,7,7,7,
7,7,7,7,7,7,7,7,36,2,1,5,5,5,5,5,5,5,5,5,5,5,5,5,
5,5,5,5,5,5,5,1,2,1,1,1,1,33,3,1,1,2,315,4,2,4,2))
columns$names <- c("blank1", # tape locations 1-19
"Resident_Status_US", # tape location 20
"blank2",
"Education_1989",
"Education_2003",
"Education_flag",
"Month_of_Death",
"blank3",
"Sex",
"DetailAge",
"Age_Substitution_Flag",
"Age_Recode_52",
"Age_Recode_27",
"Age_Recode_12",
"Infant_Age_Recode_22",
"Place_of_Death_and_Status",
"Marital_Status",
"Day_of_Week_of_Death",
"blank4",
"Current_Data_Year",
"Injury_at_Work",
"Manner_of_Death",
"Method_of_Disposition",
"Autopsy",
"blank5",
"Activity_Code",
"Place_of_Injury",
"ICD_Code_10",
"Cause_Recode_358",
"blank6",
"Cause_Recode_113",
"Infant_Cause_Recode_130",
"Cause_Recode_39",
"blank7",
"Number_Entity_Axis_Conditions",
"Condition_1EA", "Condition_2EA", "Condition_3EA", "Condition_4EA", "Condition_5EA",
"Condition_6EA", "Condition_7EA", "Condition_8EA", "Condition_9EA", "Condition_10EA",
"Condition_11EA", "Condition_12EA", "Condition_13EA", "Condition_14EA", "Condition_15EA",
"Condition_16EA", "Condition_17EA", "Condition_18EA", "Condition_19EA", "Condition_20EA",
"blank8",
"Number_Record_Axis_Conditions",
"blank9",
"Condition_1RA", "Condition_2RA", "Condition_3RA", "Condition_4RA", "Condition_5RA",
"Condition_6RA", "Condition_7RA", "Condition_8RA", "Condition_9RA", "Condition_10RA",
"Condition_11RA", "Condition_12RA", "Condition_13RA", "Condition_14RA", "Condition_15RA",
"Condition_16RA", "Condition_17RA", "Condition_18RA", "Condition_19RA", "Condition_20RA",
"blank10",
"Race",
"Bridged_Race_Flag",
"Race_Imputation_Flag",
"Race_Recode_3",
"Race_Recode_5",
"blank11",
"Hispanic_Origin",
"blank12",
"Hispanic_Origin_9_Race_Recode",
"Race_Recode_40",
"blank13",
"CensusOcc",
"Occ_26",
"CensusInd",
"Ind_23")
# Read in file using parameters from 'columns' dataframe
mort2020<- read.fwf("VS20MORT.DUSMCPUB_r20220105", widths=columns$widths, stringsAsFactors=F)
# Attach column names to variables
colnames(mort2020) <- columns$names
# Remove blank variables
mort2020x <- mort2020 %>% dplyr::select(-starts_with("blank"))
Alternatively, it looks like the files are published for most years in a CSV format here: https://www.nber.org/research/data/mortality-data-vital-statistics-nchs-multiple-cause-death-data. 2020 isn’t up yet, but for other years, it can be much faster to read a CSV into R than to use read.fwf.
The data is in the form of a fixed-width file. The user's guide to the data from the National Center for Health Statistics contains the appropriate widths. The answer I present is a modified answer from another forum, posted by #Hack-R.
https://opendata.stackexchange.com/questions/18375/how-can-one-interpret-the-nvss-mortality-multiple-cause-of-death-data-sets
map <- data.frame(widths=c(19, 1,40,2,1,1,2,2,1,1,1,1,1,1,2,2,2,2,1,1,1,16,4,1,1,1,1,34,1,1,4,
3,1,3,3,2,1,2,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
36,2,1,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,1,2,1,1,1,1,33,3,
1,1))
#Set column names
map$cn <- c("blank", # cols 1-19
"res_status", #20
"blank2", # 21-60
"ed_v89",#61-62
"ed_v03",#63
"ed_flag", #64
"death_month", #65-66
"blank3",
"sex",
"age_years",
"age_months",
"age_3",
"age_4",
"age_sub_flag",
"age_recode_52",
"age_recode_27",
"age_recode_12",
"infant_age_recode_22",
"place_of_death",
"marital_status",
"death_day",
"blank4",
"current_year",
"work_injury",
"death_manner",
"disposition",
"autopsy",
"blank5",
"activity_code",
"place_injured",
"icd_cause_of_death",
"cause_recode358",
"blank6",
"cause_recode113",
"infant_cause_recode130",
"cause_recode39",
"blank7",
"num_entity_axis",
"cond1","cond2","cond3","cond4","cond5","cond6","cond7","cond8","cond9","cond10",
"cond11","cond12","cond13","cond14","cond15","cond16","cond17","cond18","cond19",
"cond20",
"blank7",
"num_rec_axis_cond",
"blank8",
"acond1", "acond2", "acond3", "acond4", "acond5", "acond6", "acond7",
"acond8", "acond9", "acond10", "acond11", "acond12", "acond13", "acond14",
"acond15", "acond16", "acond17", "acond18", "acond19", "acond20",
"blank9",
"race",
"bridged_race_flag",
"race_imp_flag",
"race_recode3",
"race_recode5",
"blank10",
"hisp",
"blank11",
"hisp_recode")
#Import the file
mort2020 <- read_fwf("./data/original/VS20MORT.DUSMCPUB_r20220105", fwf_widths(map$widths, map$cn))

merge different files into 1 text file in R

I have two files with one being text, and the other being a data frame, now I just want to merge them into one as a text file. With linux, I can use:
cat file1 file2 > outputfile
I wonder if we can do the same thing with R?
file1
##TITLE=POOLED SAMPLES COLLECTED 05-06/03/2018
##JCAMP-DX=4.24
##DATA TYPE=LINK
#ORIGIN Bentley_FTS SomaCount_FCM 82048
##OWNER=Bentley Instruments Inc
##DATE=2018-03-09
##TIME=15:34:48
##BLOCKS=110
##LAB1=Auto Generated
##LAB2=
##CHANNELNAMES=8
file 2:
649.025085449219 0.063037105 0.021338785 -0.00053594 0.008937807 0.03266982
667.675231457819 0.028557044 0.005877694 -0.015043681 0.014945094 0.051547796
686.325377466418 0.021499421 0.017125281 0.043007832 0.04132269 0.027496092
704.975523475018 0.006128653 -0.014599532 -0.000335723 0.020189898 0.024547976
723.625669483618 0.018550962 0.018567896 0.014100821 0.013067127 0.027075281
742.275815492218 0.030145327 0.039745297 0.050556265 0.056621946 0.058416516
760.925961500818 0.040279277 0.01392867 -0.00143011 0.015103153 0.03580305
779.576107509418 0.031955898 0.013671243 0.000861743 0.000641993 0.001747168
Thanks alot
Phuong
We can use file.append:
file.append("fileMerged.txt", "file1.txt")
file.append("fileMerged.txt", "file2.txt")
Or if files are already imported into R, then write with append:
#import to R
f1 <- readLines("file1.txt")
f2 <- readLines("file2.txt")
# output with append
write(f1, "fileMerged.txt")
write(f2, "fileMerged.txt", append = TRUE)

For Loop in R, all in 1 command

I created this random time series:
MM=1584
Z0<-rnorm(MM,8,1.0)#;ts.plot(Z0)
s_1=1.50; p_1=121; p_2=240
s_2=1.25; p_3=361; p_4=480
s_3=1.10; p_5=601; p_6=720
s_4=1.50; p_7=960; p_8=1020
s_5=1.25; p_9=1140; p_10=1320
s_6=1.50; p_11=1369; p_12=1440
a=(Z0[1:p_1-1])
b=(s_1+Z0[p_1:p_2])
c=(Z0[(p_2+1):(p_3-1)])
d=(s_2+Z0[p_3:p_4])
e=(Z0[(p_4+1):(p_5-1)])
f=(s_2+Z0[p_5:p_6])
g=(Z0[(p_6+1):(p_7-1)])
h=(s_3+Z0[p_7:p_8])
i=(Z0[(p_8+1):(p_9-1)])
l=(s_4+Z0[p_9:p_10])
m=(Z0[(p_10+1):(p_11-1)])
n=(s_5+Z0[p_11:p_12])
o=Z0[(p_12+1):MM]
Z=c(a,b,c,d,e,f,g,h,i,l,m,n,o);ts.plot(Z)
abline(v=p_1,col="red");abline(v=p_2,col="red");abline(v=p_3,col="red")
abline(v=p_4,col="red");abline(v=p_5,col="red");abline(v=p_6,col="red")
abline(v=p_7,col="red");abline(v=p_8,col="red");abline(v=p_9,col="red")
abline(v=p_10,col="red");abline(v=p_11,col="red");abline(v=p_12,col="red")
Zm=as.data.frame(Z)
write.csv2(Zm, file="C:/Users/Luca/Dekstop/Zm/Zm1.csv")
I would like to repeat these commands to create 100 series and to save these with write.cs2(...Zm"...".csv).
I don't want to change the file names and repeat the commands all manually.
I searched something useful in other questions but I didn't find it.
The loop has to change only the name of data frame (Zm) and the file names, for each loop.
I'm looking to repeat 100 times the creation of Z0 (Z01, Z02, Z03 ... Z0100) , then Z (Z1, Z2, ... Z100) so Zm (Zm1, Zm2, Zm3... Zm100) and save them in the folder with new file names (folder/Zm1, Zm2, Zm3 etc...) all in 1 command with a loop.
I'm not sure why you want to change the name of the data frames, but dynamically changing the name of the file is straightforward.
for (i in 1:100) { ... write.csv2(Zm, file=paste("C:/Users/Luca/Dekstop/Zm/Zm", i, ".csv", sep = "")) }
If you want to keep the created data frames, why not just simply use a list?

Resources