Vcorpus Rstudio combining .txt files - r

I have a directory of .txt files and need to combine then into one file. each file would be a separate line. I tried:
new_corpus <-VCorpus(DirSource("Downloads/data/"))
The data is in the file but I get an error
Error in DirSource(directory = "Downloads/data/") :
empty directory
This is a bit basic but I was only given this information on how to create the corpus. What I need to do is take this file and create one factor that is the .txt and another with an ID, in the form of:
ID .txt
ID .txt
.......
EDIT To clarify on emilliman5 comment:
I need both a data frame and a corpus. The example I am working from used a csv file with the data already tagged for a Naive Bayes problem. I can work through that example and all the steps. The data I have is in a different format. It is 2 directories (/ham and /spam) of short .txt files. I was able to create a corpus, when I changed my command to:
new_corpus <-VCorpus(DirSource("~/Downloads/data/"))
I have cleaned the raw data and can make DTM but at the end I will need to create a crossTable with the labels spam and ham. I do not understand how I insert that information into the corpus.

Related

Trying to create new columns using header information, add a column containing the file name and merge multiple csv files in R

I have only recently started using R and am now trying to automate some tasks with it. I've a task where I want to merge information from ~300 .csv files. Each file is in the same format with information in a header section followed by data in standard columns.
I want to
Create a new column that contains the file name
Create columns that use header information (e.g. lot number) on each row in the file
Merge all csv files in a folder together.
I've seen bits of code that can merge csv files together using list_files(), lapply() and bind_rows() but struggling to get the header information into new columns before merging the csv files together.
sample of csv file
Has anyone a solution to this?

Converting RData to CSV file returns incorrect CSV file

I do not have any expertise on R and I have to convert RData files to CSV to analyze the data. I followed the following links to do this: Converting Rdata files to CSV and "filename.rdata" file Exploring and Converting to CSV. The second option seemed to be a simpler as I failed to understand the first one. This is what I have tried till now and the results along with it:
>ddata <- load("input_data.RData")
>print(ddata)
[1] "input_data"
> print(ddata[[1]])
[1] "input_data"
> write.csv(ddata,"test.csv")
From the first link I learnt that we can see the RData type and when I did str(ddata) I found out that it is a List of size 1. Hence, I checked to see if print(ddata[[1]]) would print anything apart from just "input_data". With the write.csv I was able to write it to a csv without any errors but it has just the following 2 lines inside the CSV file:
"","x"
"1","input_data"
Can you please help me understand what am I doing wrong and show a way to get all the details in a csv?
The object ddata contains the name of the object(s) that load() created. Try typing the command ls(). That should give you the names of the objects in your environment. One of them should be input_data. That is the object. If it is a data frame (str(input_data)), you can create the csv file with
write.csv(input_data, "test.csv")

How to write table on Juliabox?

I define a DataFrame named data and want to write it into .csv file. I used writetable("result_data.csv", data) but it doesn't work.
This is the dataframe
error details
To write a data frame to a disk you should use the CSV.jl package like this (also make sure that you have write right to the directory you want to save the file on Juliabox):
using CSV
CSV.write("result_data.csv", data)
If this fails then please report back in the comment I will investigate it further.

Is there a way to read multiple excel files into R, but only up to a certain creation date? (Note: Date does not exist within the actual excel files.)

I have multiple excel files in multiple directories that I am reading into R. However, I don't want to read in EVERY excel file; I only want to read in the most recent ones (for example, only the ones created in the last month). Is there a way to do this?
Currently I am using this to read in all of the excel files, which is working just fine:
filenames <- Sys.glob(file.path('(name of dir)', "19*", "Electrode*02.xlsx")) <br>
elecsheet <- do.call("cbind", lapply(filenames, read_excel))
Somewhere in this second line of code (I think), I need to tell R to look at the metadata and only read in the excel files that have been created since a certain date.
Thank you!

csv not retaining format after splitting in R

I have a csv file say named abc.csv where name of a column is Component.Number and there are 20 different components I'm working with. The csv file contains 2000 entries which are sorted by component number. I'm using the following code to split the csv into 20 csv files where one file contains only data corresponding to a particular Component.Number.
abc = read.csv("abc.csv")
for (name in levels(abc$Component.Number)){
tmp=subset(abc,Component.Number==name)
#Create a new filename for each Component - the folder 'skews' should already exist in the same directory
fn=paste('skews/',gsub(' ','',name),sep='')
#Save the CSV file containing separate expenses data for each Component
write.table (tmp,fn,row.names=FALSE,sep = ",")
}
The code is working fine and I'm getting split files in the "skews" folder but the format of the files are not csv, in fact they don't have any file type. I have also tried write.csv instead of write.table in the last line, but no luck. So, how do I get the split files in .csv format and run the same R code on all of them using some kind of loop? The file names are those different component numbers. Thanks.

Resources