Using a CSV file to create a stem plot - r

I'm new to R (and anything programming related) so am getting my head around what is actually happening.
I created a CSV file in Excel with one column consisting of names and the other column consisting of pretend exam scores for each name. I saved it from Excel as a CSV file.
So to import it into R I used the following command:
data1<- read.csv(file.choose(),header=F)
When I created the CSV file I didn't create any headers so the column for names is given the header V1 and the column for the exam scores is given the header V2.
So to create my stem plot I then use the command:
class_stem <- stem(data1$V2)
Is this the most efficient way to do this?
My real confusion starts when I import the data as a table. Should I be even importing it as a table or just leaving it as I had done? My purpose at this stage was just to create a stem and leaf plot

Related

Trying to create new columns using header information, add a column containing the file name and merge multiple csv files in R

I have only recently started using R and am now trying to automate some tasks with it. I've a task where I want to merge information from ~300 .csv files. Each file is in the same format with information in a header section followed by data in standard columns.
I want to
Create a new column that contains the file name
Create columns that use header information (e.g. lot number) on each row in the file
Merge all csv files in a folder together.
I've seen bits of code that can merge csv files together using list_files(), lapply() and bind_rows() but struggling to get the header information into new columns before merging the csv files together.
sample of csv file
Has anyone a solution to this?

crawl excel data automatically based on contents

I want to get data from many excel files with the same format like this:
What I want is to output the ID data from column B to a CSV file. I have many files like this and for each file, the number of columns may not be the same but the ID data will always be in the B column.
Is there a package in Julia that can crawl data in this format? If not, what method should I use?
You can use the XLSX package.
If the file in your screenshot is called JAKE.xlsx and the data shown is in a sheet called DataSheet:
data = XLSX.readtable("JAKE.xlsx", "DataSheet")
# `data[1]` is a vector of vectors, each with data for a column.
# that way, `data[1][2]` correponds to column B's data.
data[1][2]
This should give you access to a vector with the data you need. After getting the IDs into a vector, you can use the CSV package to create an output file.
If you add a sample xlsx file to your post it might be possible to give you a more complete answer.

Importing CSV files in R leads to unnecessary variables & observations

Here are the files I'm trying to import:
Data
Included are two files, xlsx and CSV, that represent the same dataset. Though they represent the same information, I get different results when I import them into R. Via the read_excel(file.choose()) command, I can correctly import the xlsx file correctly, but if I use the read.csv(file.choose(), sep=";") command on the CSV file, I get unnecessary additional observations and variables. I only saved the Excel file as a comma separated values files (.csv) so R should actually construct the same data frames. What did I do wrong?

Vcorpus Rstudio combining .txt files

I have a directory of .txt files and need to combine then into one file. each file would be a separate line. I tried:
new_corpus <-VCorpus(DirSource("Downloads/data/"))
The data is in the file but I get an error
Error in DirSource(directory = "Downloads/data/") :
empty directory
This is a bit basic but I was only given this information on how to create the corpus. What I need to do is take this file and create one factor that is the .txt and another with an ID, in the form of:
ID .txt
ID .txt
.......
EDIT To clarify on emilliman5 comment:
I need both a data frame and a corpus. The example I am working from used a csv file with the data already tagged for a Naive Bayes problem. I can work through that example and all the steps. The data I have is in a different format. It is 2 directories (/ham and /spam) of short .txt files. I was able to create a corpus, when I changed my command to:
new_corpus <-VCorpus(DirSource("~/Downloads/data/"))
I have cleaned the raw data and can make DTM but at the end I will need to create a crossTable with the labels spam and ham. I do not understand how I insert that information into the corpus.

read a selected column from multiple csv files and combine into one large file in R

Hi,
My task is to read selected columns from over 100 same formatted .csv files in a folder, and to cbind into a big large file using R. I have attached a screen shot in this question for a sample data file.
This is the code I'm using:
filenames <- list.files(path="G:\\2014-02-04")
mydata <- do.call("cbind",lapply(filenames,read.csv,skip=12))
My problem is, for each .csv file I have, the first column is the same. So using my code will create a big file with duplicate first columns... How can I create a big with just a single column A (no duplictes). And I would like to name the second column read from each .csv file using the value of cell B7, which is the specific timestamp of each .csv file.
Can someone help me on this?
Thanks.

Resources