So I have a .csv file formatted as such:
Student,Grade
Steven,48
Tori,79
James,92
Elise,44
So I read it into R and manipulate the data a bit:
data = read.csv("/path/to/my.csv")
grades = data$Grade
grades = grades + 10
All I need to do now is write these new grades back to the grade column of my.csv, while preserving the formatting of the original csv (and the names). What is the most simple way to accomplish this?
Write the data back into the data.frame:
data$Grade <- grades ## you can actually skip the middle step here
Use write.table() with the correct settings:
write.table(data, file="/path/to/csv", sep=",", row.names=FALSE)
data$Grade <- data$Grade + 10
write.csv(data, file="/out/file/path.csv")
Related
I am a newbie and need help with manipulating the data I have.
I have an excel workbook with 12 excel sheet with each sheet having approximately 140k rows.
Is it possible to turn them all into 1 via R and then export the file to csv or txt please?
Thank you
Tried using reasxl and tidy verse
Path<- "C:/data"
Setwd(path)
Sheet=excel_sheets("df.xlsx")
Data= lapply(setNames(sheet, sheet), function(x) read_excel("df.xlsx", sheet=x))
Data=bind_rows(Data, id="sheet")
Lapply(Data, function(x) write.table(Data(x), 'data0.csv', append =T, sep= ','))
And I still don't get a one file having all the data sheets combined.
Here's a simple approach using data.table and openxlsx. I'm not sure about the full structure of your data but you can easily perform other operations when reading in the data (if needed) before combining it all and writing to an output file.
library(data.table)
library(openxlsx)
file <- 'my_file.xlsx' #full path and name of your file
sheet_names <- getSheetNames(file = file)
# loop through sheetnames to read data
data_list <- lapply(sheet_names, function(z){
dat <- as.data.table(read.xlsx(xlsxFile = file, sheet = z))
dat$sheet <- z #added to check which sheet the data was retrieved from
# other operations could be added here, e.g. any sheets that contain
# "raw" in the name need addl. calculations
return(dat)
})
# bind all data together
data_combined <- rbindlist(l = data_list, use.names = T, fill = T)
# write to a csv file - xlsx might exceed max. allowable rows
fwrite(x = data_combined, file = 'new_file_name.csv')
I wrote an R script to make some scientometric analyses of Journal Citation Report data (JCR), which I have been using and updating in the past years.
Today, Clarivate has just introduced some changes in its database and now the exported CSV file contains one last empty column, which spoils my script. Because of this last empty column, read.csv automatically assumes that the first column contains the row names.
As before, there is also one first useless row, which is automatically removed in my script with skip = 1.
One simple solution to this "empty column situation" would be to manually remove this last column in Excel, and then proceed with my script as usual.
However, is there a way to add this removal to my script using base R?
The beginning of my script is:
jcreco = read.csv("data/jcr ecology 2020.csv",
na = "n/a", skip = 1, header = T)
The original CSV file downloaded from JCR is available in my Dropbox.
Could you please help me? Thank you!
The real problem is that empty column doesn't have a header. If they had only had the extra comma at the end of the header line this probably wouldn't be as messy. But you can also do a bit of column shuffling with fill=TRUE. For example
dd <- read.table("~/../Downloads/jcr ecology 2020.csv", sep=",",
skip=2, fill=T, header=T, row.names=NULL)
names(dd)[-ncol(dd)] <- names(dd)[-1]
dd <- dd[,-ncol(dd)]
This reads in the data but puts the rows names in the data.frame and fills the last column with NA. Then you shift all the column names over to the left and drop the last column.
Here is a way.
Read the data as text lines;
Discard the first line;
Remove the end comma with sub;
Create a text connection;
And read in the data from the connection.
The variable fl holds the file, on my disk I had to set the directory.
fl <- "jcr_ecology_2020.csv"
txt <- readLines(fl)
txt <- txt[-1]
txt <- sub(",$", "", txt)
con <- textConnection(txt)
df1 <- read.csv(con)
close(con)
head(df1)
I am combining 1 column of data from multiple/many text files into a single CSV file. This part is fine with the code I have. However, I would like to have after importing the filename (e.g., "roth_Aluminusa_E1.0.DPT") as the column header for the data column taken from that file. I know, similar questions have been asked but I can't work it out. Thanks for any help :-)
Code I am using which works for combining the files:
files3 <- list.files()
WAVELENGTH <- read.table(files3[1], header=FALSE, sep=",")[ ,1]
TEST9 <- do.call(cbind,lapply(files3,function(fn)read.table(fn, header=FALSE, sep = ",")[ , 2]))
TEST10 <- cbind(WAVELENGTH, TEST9)
You can do the following to add column names to TEST10. This assumes the column name you want for the first column is files3[1]
colnames(TEST10) <- c(files3[1], files3)
In case you want to keep the name of the first column as is, then we add the desired column names before binding WAVELENGTH with TEST9.
colnames(TEST9) <- files3
TEST10 <- cbind(WAVELENGTH, TEST9)
Then you can write to a csv as usual, keeping the column names as headers in the resulting file.
write.csv(TEST10, file = "TEST10.csv", row.names = FALSE)
Hi so I have a data in the following format
101,20130826T155649
------------------------------------------------------------------------
3,1,round-0,10552,180,yellow
12002,1,round-1,19502,150,yellow
22452,1,round-2,28957,130,yellow,30457,160,brake,31457,170,red
38657,1,round-3,46662,160,yellow,47912,185,red
and I have been reading them and cleaning/formating them by this code
b <- read.table("sid-101-20130826T155649.csv", sep = ',', fill=TRUE, col.names=paste("V", 1:18,sep="") )
b$id<- b[1,1]
b<-b[-1,]
b<-b[-1,]
b$yellow<-B$V6
and so on
There are about 300 files like this, and ideally they will all compiled without the first two lines, since the first line is just id and I made a separate column to identity these data. Does anyone know how to read these table quickly and clean and format the way I want then compile them into a large file and export them?
You can use lapply to read all the files, clean and format them, and store the resulting data frames in a list. Then use do.call to combine all of the data frames into single large data frame.
# Get vector of files names to read
files.to.load = list.files(pattern="csv$")
# Read the files
df.list = lapply(files.to.load, function(file) {
df = read.table(file, sep = ',', fill=TRUE, col.names=paste("V", 1:18,sep=""))
... # Cleaning and formatting code goes here
df$file.name = file # In case you need to know which file each row came from
return(df)
})
# Combine into a single data frame
df.combined = do.call(rbind, df.list)
I have a technical question in R:
how can I rowbind the following results (results1 and result2) into a data frame and keeping the columns labels for both:
result1:
meanAUC.SIM meanCmax.SIM meanTmax.SIM AUC.OBS Cmax.OBS Tmax.OBS PE.AUC PE.Cmax PE.Tmax
777.4444 74.64377 4.551254 820.7667 73.46508 3.089009 5.278274 1.604416 47.33703
result2:
medianAUC.SIM medianCmax.SIM medianTmax.SIM AUC.OBS Cmax.OBS Tmax.OBS PE.AUC PE.Cmax PE.Tmax
764.6611 72.4534 4.5 795.765 68.2 3 3.908683 6.236657 50
The reason behind this is that I want to write them in a *.csv file in an organized way with the correct labeling.
If the only reason you want to combine the data frames is to write them to a csv file, then you can instead just write each data frame separately to the same csv file. For example:
write.table(result1, "myfile.csv", row.names=FALSE, sep=",")
# If you want a blank row between them
cat("\n", file = "myfile.csv", append = TRUE)
write.table(result2, "myfile.csv", row.names=FALSE, sep=",", append=TRUE)
Here's what the file looks like: