I am trying to convert a dataframe to a character array in R.
THIS WORKS BUT THE TEXT FILE ONLY CONTAINS LIKE 83 RECORDS
data <- readLines("https://www.r-bloggers.com/wp-content/uploads/2016/01/vent.txt")
df <- data.frame(data)
textdata <- df[df$data, ]
THIS DOES NOT WORK..MAYBE BECAUSE IT HAS 3k RECORDS?
trump_posts <- read.csv(file="C:\\Users\\TAFer\\Documents\\R\\TrumpFBStatus1.csv",
sep = ",", stringsAsFactors = TRUE)
trump_text <- trump_posts[trump_posts$Facebook.Status, ]
All I know is I have a dataframe called trump posts. The frame has a single column called Facebook.Status. I just wanted to turn it into a character array so I can run an analysis on it.
Any help would be very much appreciated.
Thanks
If Facebook.Status is a character vector you can directly perform your analysis on it.
Or you can try:
trump_text <- as.character(trump_posts$Facebook.Status)
I think you are somehow confusing data.frame syntax with data.table syntax. For DF, you'd reference vector as df$col. However, for DT it is somewhat similar to what you wrote dt[,col] or dt[,dt$col]. Also, if you want a character vector right away, set stringsAsFactors = F in your read.csv. Otherwise you'll need extra conversion, for example, dt[,as.character(col)] or as.character(df$col).
And on a side note, size of vector is almost never an issue, unless you hit the limits of your hardware.
Related
I have several .txt files which need to be imported to R as dataframes for some data analysis. One of these files has no EOL in any form, so I'm left wondering how I would go about to import that.
\"A\";\"B\";\"C\";\"D\";\"D\";\"E\";\"F\";\"G\";\"H\";\"I\";\"J\";\"K\";\"L\";\"M\";\"N\";\"O\";\"P\";\"Q\";\"R\";\"S\";\"T\";\"U\";\"V\"\"1\";4;\"55-555-5555-555\";1234-56-78;\"111\";1510;5;1234-12-17;12345.1234512345;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA\"2\";6;\"22-222-2222-222\";5678-56-78;\"222\";2051;0;NA;0;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA
This is how the first ~500 characters of that .txt file look like. The EOL would need to be placed like this:
\"A\";\"B\";\"C\";\"D\";\"D\";\"E\";\"F\";\"G\";\"H\";\"I\";\"J\";\"K\";\"L\";\"M\";\"N\";\"O\";\"P\";\"Q\";\"R\";\"S\";\"T\";\"U\";\"V\"
\"1\";4;\"55-555-5555-555\";1234-56-78;\"111\";1510;5;1234-12-17;12345.1234512345;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA
\"2\";6;\"22-222-2222-222\";5678-56-78;\"222\";2051;0;NA;0;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA
Normally I would just gsub a "\n" to the places I need it to be, but there is no reoccurring string at the places where I would place a \n, so I don't think that gsub would work in this instance.
Seeing how the missing values are clearly indicated with NA, is there a function similar to read_delim that has a "col_number = x" argument? Like the first x values are the headers, the next x values are the values of the first row and so on and so forth?
If it changes anything, these .txt files are rather big (>300mb).
Big thank you to Julian_Hn. Works like a charm.
I would probably just read this in as a vector and then reformat as matrix with the number of columns you know are in the dataset. This essentially does what you want
str <- "\"A\";\"B\";\"C\";\"D\";\"D\";\"E\";\"F\";\"G\";\"H\";\"I\";\"J\";\"K\";\"L\";\"M\";\"N\";\"O\";\"P\";\"Q\";\"R\";\"S\";\"T\";\"U\";\"V\";\"1\";4;\"55-555-5555-555\";1234-56-78;\"111\";1510;5;1234-12-17;12345.1234512345;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;\"2\";6;\"22-222-2222-222\";5678-56-78;\"222\";2051;0;NA;0;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA;NA"
vec <- strsplit(str,";")[[1]]
//EDIT: add byrow = T To stay in the right format. Thanks Yuriy
table <- matrix(vec,ncol=23,nrow=3, byrow = T)
df <- as.data.frame(table)
I have imported a CSV file to R but now I would like to extract a variable into a vector and analyse it separately. Could you please tell me how I could do that?
I know that the summary() function gives a rough idea but I would like to learn more.
I apologise if this is a trivial question but I have watched a number of tutorial videos and have not seen that anywhere.
Read data into data frame using read.csv. Get names of data frame. They should be the names of the CSV columns unless you've done something wrong. Use dollar-notation to get vectors by name. Try reading some tutorials instead of watching videos, then you can try stuff out.
d = read.csv("foo.csv")
names(d)
v = d$whatever # for example
hist(v) # for example
This is totally trivial stuff.
I assume you have use the read.csv() or the read.table() function to import your data in R. (You can have help directly in R with ? e.g. ?read.csv
So normally, you have a data.frame. And if you check the documentation the data.frame is described as a "[...]tightly coupled collections of variables which share many of the properties of matrices and of lists[...]"
So basically you can already handle your data as vector.
A quick research on SO gave back this two posts among others:
Converting a dataframe to a vector (by rows) and
Extract Column from data.frame as a Vector
And I am sure they are more relevant ones. Try some good tutorials on R (videos are not so formative in this case).
There is a ton of good ones on the Internet, e.g:
* http://www.introductoryr.co.uk/R_Resources_for_Beginners.html (which lists some)
or
* http://tryr.codeschool.com/
Anyways, one way to deal with your csv would be:
#import the data to R as a data.frame
mydata = read.csv(file="SomeFile.csv", header = TRUE, sep = ",",
quote = "\"",dec = ".", fill = TRUE, comment.char = "")
#extract a column to a vector
firstColumn = mydata$col1 # extract the column named "col1" of mydata to a vector
#This previous line is equivalent to:
firstColumn = mydata[,"col1"]
#extract a row to a vector
firstline = mydata[1,] #extract the first row of mydata to a vector
Edit: In some cases[1], you might need to coerce the data in a vector by applying functions such as as.numeric or as.character:
firstline=as.numeric(mydata[1,])#extract the first row of mydata to a vector
#Note: the entire row *has to be* numeric or compatible with that class
[1] e.g. it happened to me when I wanted to extract a row of a data.frame inside a nested function
I have a .csv files with many rows. However, I want to read only the first row in a vector format. I know that this works:
names(read.csv("file.csv",nrows=1L))
However it creates a data.frame first before reading the names which seems very inefficient. Weirdly, this doesn't seem to work:
names(read.csv("file.csv",nrows=0L))
I also tried using strsplit(readLines()), but the row contains quotes which are read as backslash and so this method doesn't work.
I have also tried using fread, but it is as slow as read.csv.
Does anyone have a solution to this problem? For reference, here's what the first row looks like:
"Timestamp","Parameter_1","Parameter_2","Parameter_3"
con <- file("somefile.csv")
st <- scan(con, what = "", nlines = 1, sep=",", quote = "\"",)
class(st): returns a character vector
I have a CSV file in Excel to be read into R using the read.csv function. However, in the Excel file, some elements are blank, which indicate 0's. When I read the file into R, those elements are still blank. How can I fill these elements as 0's in R? It seems that is.na-like functions won't apply to this situation. Thanks!
Depends on how they're being read in to the R. Blank cells in a numeric column should usually be interpreted as NA, in which case
your_df$your_column[is.na(your_df$your_column)] <- 0
should work. Your question suggests that doesn't work, in which case they might be read in as empty characters. In that case,
your_df$your_column[your_df$your_column==""] <- 0
ought to do it. If you post a reproducible example (e.g. with a link to the file on Dropbox) it will be possible to be more specific.
Like Drew says the way to get NAs from blank is when you read in. Provide code and example output of the read in data for better responses. Also play around with str as you can see what classes are in the data, which is valuable info.
You may run into some hanky panky (i.e., the data is a factor or character and not a numeric vector) if you have blank cells and the column is numeric. This approach would address that:
## Make up some data
dat <- data.frame(matrix(c(1:3, "", "", 1:3, "", 1:3, rep("", 3), 5), 4))
data.frame(apply(dat, 2, function(x) {
x[x == ""] <- 0
as.numeric(x)
}))
The read.table and read.csv functions in R are used to parse a file or URL containing delimited data and produce an R data frame. However, I already have a character vector that contains the CSV delimited data (using comma and \n as column and record delimiters), so I don't need to read it from a file or URL. How can I pass this character vector into read.table, read.csv, or scan() without writing it to a file on disk first and reading it back in? I realize that writing it to disk is possible, but I am looking for a solution that does not require this needless roundtrip and can read data from the character vector directly.
You can use textConnection() to pass the character vector to read.table(). An example:
x <- "first,second\nthird,fourth\n"
x1 <- read.table(textConnection(x), sep = ",")
# x1
V1 V2
1 first second
2 third fourth
Answer found in the R mailing list.
2017 EDIT
Seven years later, I'd probably do it like this:
read.table(text = x, sep = ",")
A minor addendum to neilfws's answer. This wrapper function is great for helping answer questions on stackoverflow when the questioner has placed raw data in their question rather than providing a data frame.
textToTable <- function(text, ...)
{
dfr <- read.table(tc <- textConnection(text), ...)
close(tc)
dfr
}
With usage, e.g.
textToTable("first,second\nthird,fourth\n", sep = ",")