Read text file using R into one column - r

I'm using:
x<-read.table(file,sep="")
in order to read a space-delimited numbers from a .txt file, but I receive the data back in multiple columns because the text file contains multiple lines (whose data is no different but of the same type).
How can I read all of the numbers in the different lines into one column only?

You can use ?scan:
x <- scan(file, what = "numeric")
or something simiar, depending on the structure of your file, should work. You might need to check / adjust the sep parameter.
Description of scan:
Read data into a vector or list from the console or file.
If you want x as a column in a data.frame, you can do
dat <- data.frame(x)
afterwards.

Related

Export to CSV, keeping leading zeros when opened in Excel

I have a series of massive data files that range in size from 800k to 1.4M rows, and one variable in particular has a set length of 12 characters (numeric data but with leading zeros where other the number of non-zero digits is fewer than 12). The column should look like this:
col
000000000003
000000000102
000000246691
000000000042
102851000324
etc.
I need to export these files for a client to a CSV file, using R. The final data NEEDS to retain the 12 character structure, but when I open the CSV files in excel, the zeros disappear. This happens even after converting the entire data frame to character. The code I am using to do this is as follows.
df1 %>%
mutate(across(everything(), as.character))
##### I did this for all data frames #####
export(df1, "df1.csv")
export(df2, "df2.csv")
....
export(df17, "df17.csv)
I've read a few other posts that say this is an excel problem, and that makes sense, but given the number of data files and amount of data, as well as the need for the client to be able to open it in excel, I need a way to do it on the front end in R. Any ideas?
Yes, this is definitely an Excel problem!
To demonstrate, In Excel enter your column values save the file as a CSV value and then re-open it in Excel, the leading zeros will disappear.
One option is add a leading non-numerical character such as '
paste0("\' ", df$col)
Not a great but an option.
A slightly better option is to paste Excel's Text function to the character string. Then Excel will process the function when the function is opened.
df$col <- paste0("=Text(", df$col, ", \"000000000000\")")
#or
df$col <- paste0("=\"", df$col, "\"")
write.csv(df, "df2.csv", row.names = FALSE)
Of course if the CSV file is saved and reopened then the leading 0 will again disappear.
Another option is to investigate saving the file directly as a .xlsx file with the "writexl", or "XLSX" or similar package.

R / openxlsx / Finding the first non-empty cell in Excel file

I'm trying to write data to an existing Excel file from R, while preserving the formatting. I'm able to do so following the answer to this question (Write from R into template in excel while preserving formatting), except that my file includes empty columns at the beginning, and so I cannot just begin to write data at cell A1.
As a solution I was hoping to be able to find the first non-empty cell, then start writing from there. If I run read.xlsx(file="myfile.xlsx") using the openxlsx package, the empty columns and rows are automatically removed, and only the data is left, so this doesn't work for me.
So I thought I would first load the worksheet using wb <- loadWorkbook("file.xlsx") so I have access to getStyles(wb) (which works). However, the subsequent command getTables returns character(0), and wb$tables returns NULL. I can't figure out why this is? Am I right in that these variables would tell me the first non-empty cell?
I've tried manually removing the empty columns and rows preceding the data, straight in the Excel file, but that doesn't change things. Am I on the right path here or is there a different solution?
As suggested by Stéphane Laurent, the package tidyxl offers the perfect solution here.
For instance, I can now search the Excel file for a character value, like my variable names of interest ("Item", "Score", and "Mean", which correspond to the names() of the data.frame I want to write to my Excel file):
require(tidyxl)
colnames <- c("Item","Score","Mean")
excelfile <- "FormattedSheet.xlsx"
x <- xlsx_cells(excelfile)
# Find all cells with character values: return their address (i.e., Cell) and character (i.e., Value)
chars <- x[x$data_type == "character", c("address", "character")]
starting.positions <- unlist(
chars[which(chars$character %in% colnames), "address"]
)
# returns: c(C6, D6, E6)

Read CSV and assign values to r variables

I'm new to R, and I wonder how to read a csv file and assign the value from the csv file to a variable? For example I have a csv file and I want to assign filename and filepath to R variables. I know how to read csv into R variable with
mydata <- read.csv("testing.csv")`
But how to assign value from Filename which is 'globaldata.txt' and Filepath which is 'E:\Test\Global' to r variable
variable value
Filename globaldata.txt
Filepath E:\Test\Global
it's safe to use read.table and define the class for each variable in the argument, see the help file ?read.table
mydata <- read.table("testing.csv", colClasses = c("character", "character"))
The return value mydata will be a data frame, and u can simply extract what you want using the $ sign
e.g.
value1 <- mydata$column1
etc.
You can do the following :
Filename<-"globaldata.csv" # if this is a csv and not a .txt file
Filepath<-"E:/Test/Global/" # if you are on Windows you need to use "/"
which then allows you to do (if this is what you want)
mydata<-read.csv(paste0(Filepath,Filename))
EDIT
If I understand correctly you have a csv file named testing.csv with two columns: one with Filenames and one with Filepaths.
In that case when you have mydata<-read.csv("testing.csv")you have a dataframe with two columns. To access the first one you use mydata[,1] and for the second (Filepath) : mydata[,2]. If you want the Filename of the third entry in the file you then use mydata[3,1](before the comma is the row, after is the column)
I hope this is what you are looking for, otherwise I'm afraid I misunderstood you again. Having a look at the csv file will help to better understand the question

Inconsistency between 'read.csv' and 'write.csv' in R

The R function read.csv works as the following as stated in the manual: "If there is a header and the first row contains one fewer field than the number of columns, the first column in the input is used for the row names." That's good. However, when it comes to the function write.csv, I cannot find a way to write the csv file in a similar way. So, if I have a file.txt as below:
Column_1,Column_2
Row_1,2,3
Row_2,4,5
Then when I read it using a = read.csv('file.txt'), the row and column names are Row_x and Column_x as expected. However, when I write the matrix a to a csv file again, then what I get as a result from write.csv('file2.txt', quote=F) is as below:
,Column_1,Column_2
Row_1,2,3
Row_2,4,5
So, there is a comma in the beginning of this file. And if I would read this file again using a2 = read.csv('file2.txt'), then resulting a2 will not be the same as the previous matrix a. The row names of the matrix a2 will not be Row_x. That's, I do not want a comma in the beginning of the file. How can I get rid of this comma while using write.csv?
The two functions that you have mentioned, read.cvs and write.csv are just a specific form of the more generic functions read.table and write.table.
When I copy your example data into a .csv and try to read it with read.csv, R throws a warning and says that the header line was incomplete. Thus it resorted to special behaviour to fix the error. Because we had an incomplete file, it completed the file by adding an empty element at the top left. R understands that this is a header row, and thus the data appears okay in R, but when we write to a csv, it doesn't understand what is header and what is not. Thus the empty element only appearing in the header row created by R shows up as a regular element. Which you would expect. Basically it made our table into a 3x3 because it can't have a weird number of elements.
You want the extra comma there, because it allows programs to read the column names in the right place. In order to read the file in again you can do the following, assuming test.csv is your data. You can fix this by manually adding the column and row names in R, including the missing element to put everything in place.
To fix the wonky row names, you're going to want to add an extra option specifying which row is the row names (row.names = your_column_number) when you read it back in with the comma correctly in place.
y <- read.csv(file = "foo.csv") #this throws a warning because your input is incorrect
write.csv(y, "foo_out.csv")
x <- read.csv(file = "foo.csv", header = T, row.names = 1) #this will read the first column as the row names.
Play around with read/write.csv, but it might be worth while to move into the more generic functions read.table and write.table. They offer expanded functionality.
To read a csv in the generic function
y <- read.table(file = "foo.csv", sep = ",", header = TRUE)
thus you can specify the delimiter and easily read in excel spreadsheets (separated by tab or "\t") or space delimited files ( " " ).
Hope that helps.

Importing Multiple Text Files as Individual data.frames in R - 2.15.2 on Mac - 10.8.4

I have searched through this forum for most of the day trying to find the solution - I could not find so I am posting. If the answer is already out there please point me in the right direction.
What I have -
A directory with 40 texts files called the following
test_63x_disc_z00.txt
*01.txt
*02.txt
...
*39.txt
In each of these files there are 10 columns of data with no header and a varying number of rows.
What I want -
I want to have an individual data.frame in R for each text file with names:
file00
file01
...
file39.
I then want to do add a header column to each of these data.frames.
I then want to be able to manipulate the data at ease (this last part I can sort out once I have input a the data)
This is what I have accomplished (don't laugh now) -
I can input a single text file as a data frame and add a header, like so :
d<-read.delim("test_63x_disc_z00.txt", header = F)
colnames(d)<-c("cell","CentX","CentY","CountLabels","AvgGreen","DeviationsGreen","AvgRed","DeviationsRed","GUI-ID","Slice")
I am not sure how to set up a loop to perform each of the commands to all 40 files and maintain distinct file names.
To quicly read in a lot of data frames you can
listy <- apply(data.frame(list.files()), 1, read.table, sep="", header=F)
Then to name them the list items, you can:
names(listy) <- paste0("file", seq_along(1:40))
They are then called by listy$file1 etc.
Thanks Metrics for editing my input code... I wasn't sure how to give it the format you did.
So I figured it out (dirty version), but it still needs work -
stem <- c("/Users/stefanzdraljevic/Northwestern/2013/Carthew-Rotation/Sample-Images-Ritika/ritika-tes/statistics3/test_63x_disc_z")
The above is the stem for the file naming`
addition <- c("00","01","02","03","04","05","06","07","08","09","10","11","12","13","14","15","16","17","18","19","20","21","22","23","24","25","26","27","28","29","30","31","34","35","36","37","38","39")
This will add the number of the text file to the end of stem. I am not sure how to incorporate the "00" numbering structure without writing them all out.
colnames <- c("cell","CentX","CentY","CountLabels","AvgGreen","DeviationsGreen","AvgRed","DeviationsRed","GUI-ID","Slice")
This will add column names to the data.frame
data = NULL
for(j in stem){
data[[j]] = NULL
for(i in addition){
data[[j]][[i]] = read.table(paste(j,i,".txt",sep=""), header=F, col.names=colnames)
}
}
This loop does the trick.

Resources