I have a csv file with data like this
firstcolumn secondcolumn
text1 freetext 1
text2 freetext 2
When I read the csv file I use this:
df <- read.csv("C:/Users/Desktop/testfile.csv", header=TRUE, sep=",")
Is there any parameter I should include in order to have every line of the second column as chr?
I am assuming, when you do read.csv, the second column will be of the type factors.
You can do this to cross check:
class(df$secondcolumn)
Now, if you want to convert them to characters, I can think of two ways. The first one does not work for me always, but the second one does.
First one:
stringsAsFactors needs to be set to false FALSE
df <- read.csv("C:/Users/Desktop/testfile.csv", header=TRUE, sep=",", stringsAsFactors=FALSE)
Second one:
If the first method does not work, then you do this manually by setting the particular column to characters
df$secondcolumn <- as.character(df$secondcolumn)
Can you use the import wizard in RStudio? You can specify all the formats there, and have the wizard generate the code.
Related
I wrote an R script to make some scientometric analyses of Journal Citation Report data (JCR), which I have been using and updating in the past years.
Today, Clarivate has just introduced some changes in its database and now the exported CSV file contains one last empty column, which spoils my script. Because of this last empty column, read.csv automatically assumes that the first column contains the row names.
As before, there is also one first useless row, which is automatically removed in my script with skip = 1.
One simple solution to this "empty column situation" would be to manually remove this last column in Excel, and then proceed with my script as usual.
However, is there a way to add this removal to my script using base R?
The beginning of my script is:
jcreco = read.csv("data/jcr ecology 2020.csv",
na = "n/a", skip = 1, header = T)
The original CSV file downloaded from JCR is available in my Dropbox.
Could you please help me? Thank you!
The real problem is that empty column doesn't have a header. If they had only had the extra comma at the end of the header line this probably wouldn't be as messy. But you can also do a bit of column shuffling with fill=TRUE. For example
dd <- read.table("~/../Downloads/jcr ecology 2020.csv", sep=",",
skip=2, fill=T, header=T, row.names=NULL)
names(dd)[-ncol(dd)] <- names(dd)[-1]
dd <- dd[,-ncol(dd)]
This reads in the data but puts the rows names in the data.frame and fills the last column with NA. Then you shift all the column names over to the left and drop the last column.
Here is a way.
Read the data as text lines;
Discard the first line;
Remove the end comma with sub;
Create a text connection;
And read in the data from the connection.
The variable fl holds the file, on my disk I had to set the directory.
fl <- "jcr_ecology_2020.csv"
txt <- readLines(fl)
txt <- txt[-1]
txt <- sub(",$", "", txt)
con <- textConnection(txt)
df1 <- read.csv(con)
close(con)
head(df1)
I'm trying to import a csv file into a vector. There are 100 entries in this csv file, and this is what the file looks like:
My code reads as follows:
> choice_vector <- read.csv("choices.csv", header = FALSE, fileEncoding="UTF-8-BOM")
> choice_vector
And yet, when I try to display said vector, it shows up as:
It is somehow creating a second column which I cannot figure out why it is doing so. In addition, trying to write to a new csv file actually writes the contents of that second column to that as well.
The second column was "habilitated" in excel.
Option1: Manually delete the column in excel.
Option2: Delete all columns with all NA
choice_vector2 <- choice_vector[,colSums(is.na(choice_vector))<nrow(choice_vector)]
In case of being interested in reading the first column only:
choice_vector <- read.csv("choices.csv", header = FALSE, fileEncoding="UTF-8-BOM")[,1]
Good luck!
Short answer:
You have an issue with your data file, but
choice_vector <- read.csv("choices.csv", header = FALSE, fileEncoding="UTF-8-BOM")$V1
should create the vector that you're expecting.
Long answer:
The read.csv function returns a data frame and you need to address a particular column within the data frame with the $ operator in order to extract that column as a vector. As for why you have an unexpected column of NAs, your CSV probably codes for two columns. When you read a CSV with R, a comma indicates a data field to its right. If you look at your CSV with a text editor, I'm guessing it'll look like this:
A,
B,
D,
A,
A,
F,
The absence of anything (other than another comma or a line break) to the right of a comma is interpreted as NA.
If we are using fread from data.table, there is a select option to select only the columns of interest
library(data.table)
dt <- fread("choices.csv", select = 1)
Other than that, it is not clear about why the issue happens. Could be some strange white space. If that is the case, specify strip.white = TRUE (by default it is FALSE)
read.csv(("choices.csv", header = FALSE,
fileEncoding="UTF-8-BOM", strip.white = TRUE)
Or as we commented, copy the columns of interest into a new file, save it and then read with read.csv
I'm improving my R-skills rebuilding some of the amazing stuff they do on r-bloggers. Right now im trying to reproduce this:
http://wiekvoet.blogspot.nl/2015/06/deaths-in-netherlands-by-cause-and-age.html. The relevant dataset for this excersize could be found here:
http://statline.cbs.nl/Statweb/publication/?VW=D&DM=SLNL&PA=7052_95&D1=0-1%2c7%2c30-31%2c34%2c38%2c42%2c49%2c56%2c62-63%2c66%2c69-71%2c75%2c79%2c92&D2=0&D3=0&D4=0%2c10%2c20%2c30%2c40%2c50%2c60%2c63-64&HD=150710-0924&HDR=G1%2cG2%2cG3&STB=T
If I'm diving into the code (to be found at the bottom of the first link) and am running into this piece of code:
r1 <- read.csv(sep=';',header=FALSE,
col.names=c('Causes','Causes2','Age','year','aantal','count'),
na.strings='-',text=txtlines[3:length(txtlines)]) %>%
select(.,-aantal,-Causes2)
Could anybody help me seperating the steps that are taken here?
Here is an explanation of what each line in the call to read.csv() is doing from your example. Note that the assignment of the last parameter text is complicated and is dependent on the script from the link you gave above. From a high level, he is first reading in all lines from the file "Overledenen__doodsoo_170615161506.csv" which contain the string "Centraal", using only the third to final lines from that filtered set. There is an additional step applied to these lines as well.
r1 <- read.csv( # columns separate by semi-colon
sep=';',
# first row is data (i.e. is NOT a header)
header=FALSE,
# names of the six columns
col.names=c('Causes','Causes2','Age','year','aantal','count'),
# treat hyphen as NA
na.strings='-',
# read from third line to final line of the original input
# Overledenen__doodsoo_170615161506.csv, after some
# filtering has been applied
text=txtlines[3:length(txtlines)]) %>% select(.,-aantal,-Causes2)
The read.csv, read the csv file, separating column with the separator ";"
so that an input like this a;b;c will be separated in: first column=a, second=b, third=c
header=FALSE -> It specifies no header in the original file was given.
col.names assigns the listed names to your columns in r
na.strings='-' substitutes NA values with '-'
text=txtlines[3:length(txtlines)]) read the lines from position 3 till the end.
%>% select(.,-aantal,-Causes2) filter the data frame
I'm using RStudio and I wanted to import csv data.
This data has 3 columns and they are separated by ",".
Now I type test <- read.csv("data1.csv", sep=",")
Data is imported but its imported as just ONE column.
Headers are okay, but also the headers (actually 3) are combined together in just one column.
If I set header=F, there was V1 as heading. So there really is just one column.
Why is my separator not working?
try read_csv() from readr package
install.packages("devtools")
devtools::install_github("hadley/readr")
with your sample input
library(readr)
file <- 'Alter.des.Hauses,"Quadratfuß","Marktwert"\n33,1.812,"$90.000,00"\n'
read_csv(file) # for the actual use read_csv("file.csv") ...
read_csv2(file)
I'd like to export random value defined in R as vector (or any other object) to a specific location in a text file. With the use of read.fwf I managed to read the data that is not csv or tab delineated (based on location in file), but no I can not find a suitable way to write/export some random value in the selected (defined) line/row and column in a txt file. I would appreciate any help or suggestions. I was looking to write.table, sink and also some other options for data export, but none of them worked or at least I was not able to complete the task ...
You don't need to use read.fwf if you just want to replace specific characters. Instead, scan in the file line by line as a vector of character strings. Then you can use substring<- to replace specific positions by line and column.
Here's a simple example:
mydat <- scan(text='1234567890\n2345678901\n3456789012', what='character')
mydat
# [1] "1234567890" "2345678901" "3456789012"
substring(mydat[2],5,5) <- 'X'
mydat
# [1] "1234567890" "2345X78901" "3456789012"
substring(mydat[3],1,1) <- 'Y'
mydat
# [1] "1234567890" "2345X78901" "Y456789012"
The result can be written back to file using writeLines:
> writeLines(mydat)
1234567890
2345X78901
Y456789012