I have ~3200 .xlsx files I want to merge into one file. These files consist of several columns which contain values with commas. After converting the files to .csv, most of the commas are changed to "." and thus the values are displayed correctly. In other colums, the commas are omitted which leads to wrong values in the columns. However, this does not happen, if the value can be rounded to .5 or .0.
Example:
time_elapsed x_pred_normalised
0 0,5153
0 0,5153
10,457283 0,7824
17,458956 0,8451
82,000000 0,4511
This is how it looks in the .xlsx-file. After converting it to .csv the same part in the file looks like this:
time_elapsed x_pred_normalised
0 0.5153
0 0.5153
10457283 0.7824
17458956 0.8451
82 0.4511
To convert the files from .xlsx to .csv I used r and this code:
library(readxl)
# increase max.print
options(max.print=2000)
# Create a vector of Excel files to read
files.to.read = list.files(pattern="xlsx")
# Read each file and write it to csv
lapply(files.to.read, function(f) {
df = read_excel(f, sheet=1)
write.csv(df, gsub("xlsx", "csv", f), row.names=FALSE)
})
I am new to r (and anything related to programming and stuff) and I don't know how to fix this. I tried converting the files with windows terminal and also tried Batch convert Excel files of a folder to CSV files with VBA. Each of these options produces the same problem, but in different places of the file.
I.e. the last option omitted the comma, if the values in x_pred_normalised were >1.
If there's anything I can do, please help me. This is part of the preprocessing process of my eye-tracking data, which I need for my M.A.-thesis.
Related
I have to read three different csv files in R to merge them later. 2 files have data stored only in the first column and all the data separated with comma. Reading and merging this files workes perfectly.
One file has data in different columns. I tried to read the file with following code:
events <- read.csv(
file = "XX",
header = TRUE,
sep = "\t",
na.string = "NA",
fill = TRUE
)
All 3 files have headers but the headers of the file with data in different cells get all merged together to a big word.
My problem: the format of the 3 files after reading with R is different and I can´t merge them.
Is my code for reading the file wrong? How do I read a csv file in R with data in different cells?
Thank you a lot!
I have 30 .txt files that I need to read to a tibble. Its panel data and altogether 108M
The issue is that some files are read correctly with all values there, but some read as NA while values are there! Also, files include a lot of blank lines....
Here is what I use:
read_clean_table<-function(x){
x<-read.table(x, header = TRUE, fill = TRUE)
x[-(1:4),] #first 4 rows are system data
}
filenames<-list.files(path="./ML", pattern = ".*.txt", full.names=TRUE)
#read files and merge to table, first rows removed, FileName is the name of file
files<-filenames%>%
set_names(.) %>%
map_df(read_clean_table, .id = "FileName")%>%
mutate(FileName=str_replace_all(basename(FileName), pattern="\\.txt",""))
I tried read.delim as well with the same success...
THis is what the issue looks like
edited:
added two files
https://drive.google.com/drive/folders/1gDss6qV9aFUMpJFGHPMQZbTITJ9av-py?usp=sharing
I have 500 csv. files with data that looks like:
sample data
I want to extract one cell (e.g. B4 or 0.477) per a csv file and combine those values into a single csv. What are some recommendations on how to do this easily?
You can try something like this
all.fi <- list.files("/path/to/csvfiles", pattern=".csv", full.names=TRUE) # store names of csv files in path as a string vector
library(readr) # package for read_lines and write_lines
ans <- sapply(all.fi, function(i) { eachline <- read_lines(i, n=4) # read only the 4th line of the file
ans <- unlist(strsplit(eachline, ","))[2] # split the string on commas, then extract the 2nd element of the resulting vector
return(ans) })
write_lines(ans, "/path/to/output.csv")
I can not add a comment. So, I will write my comment here.
Since your data is very large and it is very difficult to load it individually, then try this: Importing multiple .csv files into R. It is similar to the first part of your problem. For second part, try this:
You can save your data as a data.frame (as with the comment of #Bruno Zamengo) and then you can use select and merge functions in R. Then, you can easily combine them in single csv file. With select and merge functions you can select all the values you need and them combine them. I used this idea in my project. Do not forget to use lapply.
I am new to R and I have run into a problem. I have a folder with 50 csv files, each representing a city. I want to import the each csv files into R studio as independent data frames to eventually plot all 50 cities in one time series plot.
There are four things I want to do to each csv file, but in the end, have it automated that these four actions are done to each of the 50 csv files.
Skip the first 25 row of the csv file
Combine the Date and Time column for each csv file
Remove the rows where the values in the cells in column 3 is empty
Change the name of column 3 from "ug/m3" to "CO"
After skipping, the first row will be the header
I used the code below on one csv file to see if it would work on one csv.Everything work except for city[,3][!(is.na(city[,3]))].
city1 <- read.csv("path",
skip = 25)
city1$rtime <- strptime(paste(city1$Date, city1$Time), "%m/%d/%Y %H:%M")
colnames(city1)[3] <- "CO"
city[,3][!(is.na(city[,3]))] ## side note: help with this would be appreciated, I was if something goes before the comma especially.
I am not sure how to combine everything in an efficient manner in a function.
I would appreciate suggestions on an efficient manner to perform the 4 actions ( in a function statement maybe) to each csv file while importing them to R.
Use this function for each csv you want to read
read_combine <- function(yourfile){
file <- read.csv(yourfile,skip=25)
file$rtime <- strptime(paste(file $Date, file $Time), "%m/%d/%Y %H:%M")
colnames(file)[3] <- "CO"
file$CO[!is.na(file$CO)]
}
yourfile must be "path"
I have multiple data frames that I would like to export into different tabs of an excel/csv file. I'll be grouping my 15 data frames into three groups of five. This way I would have three excel sheets with five different tabs instead of 15 individual excel sheet.
To export to excel:
#fake data
data_1<-data.frame(c(03,23,4,2))
data_2<-data.frame(c(0223,3,1,2))
data_3<-data.frame(c(0232,3,1,1))
data_4<-data.frame(c(21,23,5,6))
data_5<-data.frame(c(24,5,6,7))
#fake names
mydatasets<-c(data_1,data_2,data_3,data_4,data_5)
mytitles<-c("data1", "data2", "data3","data4", "data5")
#for loop to write out individual csv files
for (i in 1:5)) {
a <- mydatasets[i]
title<-mytitles[i]
myfile <- paste0(title, "_", ".csv")
write.csv(a, file = myfile)
}
How do I get the above code to merge those csv files into multiple tabs of one csv file or excel file?
CSV files consist of only 1 sheet. An alternative would be to write to XLSX. The function xlsx::write.xlsx takes an argument sheetName:
library(xlsx)
# Use data from the question
for (i in seq_along(mydatasets)) {
write.xlsx(x = mydatasets[i],
file = "myfile.xlsx",
sheetName = mytitles[i],
append = TRUE)
}
Note append = TRUE to avoid overwriting the file instead of appending sheets to it.
The xlsx package depends on rJava. Using that package for the first time causes trouble sometimes – see this question for a common solution.