I'm currently trying to read two .csv files, edit the data, and then write it into a new .csv.
here the code :
data <- read.csv("file.csv"), fill=TRUE, header=TRUE, row.names=NULL, stringAsFactors=FALSE, sep=",", quote="")
write.csv(data, file="out.csv")
Here's the problem :
Everything is fine with the first file (20 columns, 572 observations)
However, the other file has 163 columns and 1578 lines but when I read it with read.csv, R displays "2301 observations of 163 variables".
I tried to write this dataframe into a new csv file, and it is a total mess :
the rows have not been written entirely, the last values are written on a new row
there is a new column with integers from 1 to 2301
some data which is supposed to be in file$n is written in file$(n-1) or file$(n-2)
I'm a newbie, and I must admit I'm kind of lost : any help would be highly appreciated!
Thanks
Clément
Related
I have ~3200 .xlsx files I want to merge into one file. These files consist of several columns which contain values with commas. After converting the files to .csv, most of the commas are changed to "." and thus the values are displayed correctly. In other colums, the commas are omitted which leads to wrong values in the columns. However, this does not happen, if the value can be rounded to .5 or .0.
Example:
time_elapsed x_pred_normalised
0 0,5153
0 0,5153
10,457283 0,7824
17,458956 0,8451
82,000000 0,4511
This is how it looks in the .xlsx-file. After converting it to .csv the same part in the file looks like this:
time_elapsed x_pred_normalised
0 0.5153
0 0.5153
10457283 0.7824
17458956 0.8451
82 0.4511
To convert the files from .xlsx to .csv I used r and this code:
library(readxl)
# increase max.print
options(max.print=2000)
# Create a vector of Excel files to read
files.to.read = list.files(pattern="xlsx")
# Read each file and write it to csv
lapply(files.to.read, function(f) {
df = read_excel(f, sheet=1)
write.csv(df, gsub("xlsx", "csv", f), row.names=FALSE)
})
I am new to r (and anything related to programming and stuff) and I don't know how to fix this. I tried converting the files with windows terminal and also tried Batch convert Excel files of a folder to CSV files with VBA. Each of these options produces the same problem, but in different places of the file.
I.e. the last option omitted the comma, if the values in x_pred_normalised were >1.
If there's anything I can do, please help me. This is part of the preprocessing process of my eye-tracking data, which I need for my M.A.-thesis.
I am looking for a method and not a code as a solution. Any suggestions are welcome.
Here is a sample data that is corrupted (commas should not have been there). By the way I don't have any control over the csv files I receive.
A B C
1.1 1,859.3 52.1
0 12.2 123
In csv format it looks like:
A,B,C
1.1 ,1,859.3,52.1
0,12.2,123
But then, when I read it in using R, row 1 has an extra column and that is an error. Is there any comfortable way to identify if the csv file has any error like this extra column. I could write a bunch of nested loops that parse through length of each row but then I am talking about 1000 csvs with 100000 rows. It will take for ever. Please help. Any method is appreciated.
Save to csv using a different separator, e.g. ;
Then you would have something like
A;B;C
1.1;1,859.3;52.1
0;12.2;123
The code is simple
write.csv(..., sep = ";")
read.csv( ..., sep = ";")
It's my first post here. I'm a beginner in R maybe my problem is not very complicated.
I used mygene package from R to create an annotation for our RNAseq results. It looks nice and it generated a res list wit 3 sublists as follows (summary):
Length Class Mode
response 8 DataFrame S4
duplicates 255 data.frame list
missing 3584 -none- character
I was interested in the first list response
When I'm calling it in R looks ok:
but after exporting that as a txt file with that code:
write.table(res$response, "res.response_test.txt",quote=F,col.names = T,sep = "\t")
In the txt files some lines looks like creaked in wrong places:
I would be grateful for your help with exporting that result.
I am new to R programming. I have imported a csv file using the following function
PivotTest <- read.csv(file.choose(), header=T)
The csv file has 7 columns: Meter_Serial_No, Reading_Date, Reading_Description, Reading_value, Entry_Processed, TS_Inserted, TS_LastUpdated.
When uploading, the Meter_Serial_No is filled with zero while there are data in that column in the csv file. When running a function to see what data are in that particular column (PivotTest$Meter_Serial_No), it's returning NULL. Can anyone assist me please.
Furthermore, the csv that I'm importing has more than 127,000 rows. When doing a test with 10 rows of data only, I don't have that problem where the column Meter_Serial_No is replaced with zero.
Depends on the class of values which are there in the column (PivotTest$Meter_Serial_No). I believe there is a problem in type conversion, try the following.
PivotTest <- read.csv("test.csv", header=T,colClasses=c(PivotTest$Meter_Serial_No="character",rep("numeric",6)))
I have used R for various things over the past year but due to the number of packages and functions available, I am still sadly a beginner. I believe R would allow me to do what I want to do with minimal code, but I am struggling.
What I want to do:
I have roughly a hundred different excel files containing data on students. Each excel file represents a different school but contains the same variables. I need to:
Import the data into R from Excel
Add a variable to each file containing the filename
Merge all of the data (add observations/rows - do not need to match on variables)
I will need to do this for multiple sets of data, so I am trying to make this as simple and easy to replicate as possible.
What the Data Look Like:
Row 1 Title
Row 2 StudentID Var1 Var2 Var3 Var4 Var5
Row 3 11234 1 9/8/2011 343 159-167 32
Row 4 11235 2 9/16/2011 112 152-160 12
Row 5 11236 1 9/8/2011 325 164-171 44
Row 1 is meaningless and Row 2 contains the variable names. The files have different numbers of rows.
What I have so far:
At first I simply tried to import data from excel. Using the XLSX package, this works nicely:
dat <- read.xlsx2("FILENAME.xlsx", sheetIndex=1,
sheetName=NULL, startRow=2,
endRow=NULL, as.data.frame=TRUE,
header=TRUE)
Next, I focused on figuring out how to merge the files (also thought this is where I should add the filename variable to the datafiles). This is where I got stuck.
setwd("FILE_PATH_TO_EXCEL_DIRECTORY")
filenames <- list.files(pattern=".xls")
do.call("rbind", lapply(filenames, read.xlsx2, sheetIndex=1, colIndex=6, header=TRUE, startrow=2, FILENAMEVAR=filenames));
I set my directory, make a list of all the excel file names in the folder, and then try to merge them in one statement using the a variable for the filenames.
When I do this I get the following error:
Error in data.frame(res, ...) :
arguments imply differing number of rows: 616, 1, 5
I know there is a problem with my application of lapply - the startrow is not being recognized as an option and the FILENAMEVAR is trying to merge the list of 5 sample filenames as opposed to adding a column containing the filename.
What next?
If anyone can refer me to a useful resource or function, critique what I have so far, or point me in a new direction, it would be GREATLY appreciated!
I'll post my comment (with bdemerast picking up on the typo). The solution was untested as xlsx will not run happily on my machine
You need to pass a single FILENAMEVAR to read.xlsx2.
lapply(filenames, function(x) read.xlsx2(file=x, sheetIndex=1, colIndex=6, header=TRUE, startRow=2, FILENAMEVAR=x))