loading data and replacement in R - r
Hi sorry first post here my apologies if I made a mistake.
So I'm fairly new to R and I was given an assignment where I am loading a CSV file into R. When i read.csv the whole file I get a ton of blank spots where values should be. The only info printed out is the N/A in the cells which is actually what I am trying to replace.
So I took a small sample of the file only the first couple rows and the info came up correctly in my read.csv comand. My question is is the layout of the .csv too large to display the original data in my main.csv file?
Also, How would I go about replacing all the N/A and NA's in the file to change them to blank cells or ""
Sorry if i painted my scenario poorly
first make sure that all of you data in the csv file is in GENERAL format!
there should be a title for each of the columns too
if you have an empty cell in your csv file then input a 0 into it
and make sure that around the data you CLEAR ALL the cells around them just incase there is anything funny in them
hope that helps if not then you could send me your file to sgreenaway#vmware.com and i will check it out for you :)
Related
How to export text from R into one cell in Excel
I have 3205 observations in my dataset. Each observation contains several paragraphs worth of text and looks something like this: BRIEF_ID STATE BRIEF 01999110036250 ALABAMA paragraphs of text here... My goal is to export this dataset into Excel/csv so that it looks exactly like it does in R. So far I've tried different variations of this: write.table(MyData, file="MyData.csv", sep=",") Unfortunately, when I use this syntax, it exports into Excel/csv in a very weird way, splitting the paragraphs of text into multiple columns and multiple rows. For example BRIEF_ID STATE BRIEF 01999110036250 ALABAMA paragraphs text of here... Any idea how I can keep the paragraphs of text together in one cell? UPDATED TEXT/NOTEPAD EXAMPLE FOR 1 OBSERVATION* 41,' ' 0499970019131,ARIZONA,"GOOD AFTERNOON EVERYONE., THANK YOU FOR BEING HERE TODAY., AND I WANT TO UPDATE YOU ON WHERE ARIZONA IS IN ITS CURRENT SITUATION, WHERE OUR NUMBERS, ARE, AND THE ACTION STEPS WE INTEND TO TAKE GOING FORWARD., I WANT TO BEGIN BY JUST AGAIN SAYING THANK YOU TO ALL OF OUR NURSES, DOCTORS, EMERGENCY, MEDICAL RESPONDERS, AND HEALTHCARE WORKERS, T",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, DAY THAT WE ARE DEFINITELY,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
Well, supposing you have a dataframe data in memory, then: # run install.packages("writexl") to install it writexl::write_xlsx(data, "my_data.xlsx")
write_xlsx will probably help but from the CSV posted I think the issue is parsing. The csv sample gets imported mostly intact in Excel 365 in that the main paragraph is in a single cell on my machine so it must be some CSV / local setting on your end while importing. Working with CSV for large amounts of unstructured text that has commas can cause a lot of strange issues. I would change the separator to | or something even less commonly used by humans. Then import it using Excel Power Query by opening a blank workbook and selecting "Get Data -> Text/CSV" under the data tab and telling it what delimiter you used. You can also specify the csv format in the power query import although excel takes a good guess. Also it may be stating the obvious but those rows of ,,,,,, will translate into blank columns, I am assuming that is intended if not then there may be an issue with how the data is structured for export.
reading csv file. different variables not detected
I have read loads of csv files into r. For some reason this file I am working on, with several variables is read as if it had only 1 variable. R loads it and adds it to the global environment, the number of rows is right, but there is only 1 column. It never happened before. Have been looking around for a solution but can't find one. thanks! I have tried the following code: read.csv("file.csv",sep=",",header=TRUE) read.csv("file.csv") read.table("file.csv",sep=",") image of excel file
OK, I think I found out what the problem was. I had checked if the commas were actually commas and I copied and pasted them and they looked like commas and behaved like commas but to make sure I decided to replace (within excel) all commas by semicolons. Then I read the file again and it worked. thanks for the replies!
"arules" library's "read.transaction()" reads in CSV files with an additional, blank column for every transaction
When you attempt to read CSV files that aren't the default groceries.csv, every transaction has an additional entry in it — a blank space — which will mess up all of the calculations for analysis (and even crash R if your CSV file is big enough). I've tried to insert NA's into all of the blank cells in my CSV file, but I cannot find a way to remove all of them within the read.transactions() command (remove duplicates leaves a single NA). I haven't found a trustworthy way to fix this in any of the other questions on stackoverflow, nor anywhere else on the internet. Example entry: > inspect(trans[1:5]) items 1 {, FACEBOOK.COM, Google, Google Web Search}
It is hard to say. I assume you read the data with read.transactions(). Does your CSV file have leading white spaces in some/all lines? You could try to use the cols parameter in read.transactions() to fix the problem. An example with data and the code to replicate the problem would help.
Using R, import data from web
I have just started using R, so this may be a very dumb question. I am trying to import the data using: emdata=read.csv(file="http://lottery.merseyworld.com/cgi-bin/lottery?days=19&Machine=Z&Ballset=0&order=1&show=1&year=0&display=CSV",header=TRUE) My problem is that it reads the csv file into a single column ( by the way, the lottery data is simply because it is publicly available to download - using as an exercise to understand what I can and can't do in R), instead of formatting it into however many columns of data there are. Would someone mind helping out, please, even though this is trivial
Hm, that's kind of obnoxious for a page purporting to be in csv format. You can skip the first 5 lines, which will cause R to read (most of) the rest of the file correctly. emdata=read.csv(file=...., header=TRUE, skip=5) I got the number of lines to skip by looking at the source. You'll still have to remove the cruft in the middle and end, and then clean up the columns (they'll all be factors because of the embedded text). It would be much easier to save the page to your hard disk, edit it to remove all the useless bits, then import it. ... to answer your REAL question, yes, you can import data directly from the web. In general, wherever you would read a file, you can substitute a fully qualified URL -- R is smart enough to do the Right Thing[tm]. This specific URL just happens to be particularly messy.
You could read text from the given url, filter out the obnoxious lines and then read the result as CSV like so: lines <- readLines(url("http://lottery.merseyworld.com/cgi-bin/lottery?days=19&Machine=Z&Ballset=0&order=1&show=1&year=0&display=CSV")) read.csv(text=lines[grep("([^,]*,){5,}", lines)]) The above regular expression matches any lines containing at least five commas.
Excel data organized in multiple nested rows, can R read it?
Please see the picture. I've started using R, and know how/that it can read files from Excel, but can it read something formatted like this? http://www.flickr.com/photos/68814612#N05/8632809494/ (my apologies, upload was not working for me)
Elaborating on some of what's in the comments: If you load the file into Excel, you can save it as a fixed-width or comma-delimited text file. Either should be easy to read into R. The following may be obvious to you already. (First, a question: Are you sure that you can't get the data in a format that has one set of data per line? Is it possible that the file you're getting was generated from a different file format that is more conducive to loading the data into R?) Whether you should start rearranging the data in R or instead manipulate the raw text depends on what comes naturally to you (or to people you have around who can help). For me, personally, I would rearrange the text file outside of R before loading it into R. That's what's easiest for me. Perl is a great language for this purpose, but you could also do it with Unix shell scripts if that's accessible to you, or using a powerful editor such as Vim or Emacs. If you have no preference, I'd suggest Perl. If you have any significant programming experience, you'll be able to learn what you need. On the other hand, you're already loading it into R, so maybe it would be better to process the data there. For example, you could execute a loop that goes the text file line by line and does something like this: while (still have lines to read) { read first header line into an vector if this is the first time through the loop otherwise, read it and throw it away read data line 1 into an vector read second header line into vector if this is the first time otherwise, read it and throw it away read data line 2 into an vector read third header line into vector if this is the first time otherwise, read it and throw it away read data line 3 into an vector if this is first time through, concatenate the header vectors; store as next row in something (a file, a matrix, a dataframe, etc.) concatenate the data vectors you've been saving, and store as next row in same thing } write out the whole 2D data structure Or if the headers will never change, then you could just embed them literally into the script before the loop, and throw them out no matter what. That will make the code cleaner. Or read the first few lines of the file separately to get the headers, and then have a separate script to read the data and add it to the file with the headers in it. (The headers will probably be useful in R, so I would suggest preserving them at the top of the text file.)