How to export text from R into one cell in Excel - r

I have 3205 observations in my dataset. Each observation contains several paragraphs worth of text and looks something like this:
BRIEF_ID
STATE
BRIEF
01999110036250
ALABAMA
paragraphs of text here...
My goal is to export this dataset into Excel/csv so that it looks exactly like it does in R. So far I've tried different variations of this:
write.table(MyData, file="MyData.csv", sep=",")
Unfortunately, when I use this syntax, it exports into Excel/csv in a very weird way, splitting the paragraphs of text into multiple columns and multiple rows. For example
BRIEF_ID
STATE
BRIEF
01999110036250
ALABAMA
paragraphs
text
of
here...
Any idea how I can keep the paragraphs of text together in one cell?
UPDATED TEXT/NOTEPAD EXAMPLE FOR 1 OBSERVATION*
41,' ' 0499970019131,ARIZONA,"GOOD AFTERNOON EVERYONE., THANK YOU FOR BEING HERE TODAY., AND I WANT TO UPDATE YOU ON WHERE ARIZONA
IS IN ITS CURRENT SITUATION, WHERE OUR NUMBERS, ARE, AND THE ACTION STEPS WE INTEND TO TAKE
GOING FORWARD., I WANT TO BEGIN BY JUST AGAIN SAYING THANK
YOU TO ALL OF OUR NURSES, DOCTORS, EMERGENCY, MEDICAL RESPONDERS, AND HEALTHCARE WORKERS,
T",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
DAY THAT WE ARE DEFINITELY,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

Well, supposing you have a dataframe data in memory, then:
# run install.packages("writexl") to install it
writexl::write_xlsx(data, "my_data.xlsx")

write_xlsx will probably help but from the CSV posted I think the issue is parsing. The csv sample gets imported mostly intact in Excel 365 in that the main paragraph is in a single cell on my machine so it must be some CSV / local setting on your end while importing.
Working with CSV for large amounts of unstructured text that has commas can cause a lot of strange issues. I would change the separator to | or something even less commonly used by humans. Then import it using Excel Power Query by opening a blank workbook and selecting "Get Data -> Text/CSV" under the data tab and telling it what delimiter you used. You can also specify the csv format in the power query import although excel takes a good guess.
Also it may be stating the obvious but those rows of ,,,,,, will translate into blank columns, I am assuming that is intended if not then there may be an issue with how the data is structured for export.

Related

copying web table to ms excel does not work anymore due to changed web structure

For statistical analysis in performance sports, I often collect data form https://www.fis-ski.com/en and import it to MS Excel by copy/paste, before I work on it in RStudio. As FIS updated its website structure recently, the imported data shows up in one column and it is not possible to convert it back to a structured table via "text to columns". I tried it also by using a macro but as the imported data doesn`t have a uniform structure, because missing data (NA´s) are not shown as empty cells, converting data is quite tricky for me as a "non-programmer". The data I would like to extract are the following:
https://www.fis-ski.com/DB/alpine-skiing/biographies.html?lastname=&firstname=&sectorcode=AL&gendercode=M&birthyear=1980-2004&skiclub=&skis=&nationcode=SUI&fiscode=&status=&search=true
...but as I need the results of every single athlete in this list, here an example...
https://www.fis-ski.com/DB/general/athlete-biography.html?sectorcode=AL&seasoncode=&competitorid=230012&type=result&categorycode=&sort=&place=&disciplinecode=&position=&limit=1000
..., I have a lot of data which I need to get in order! So, I have two questions:
Is there a easy method to get the copied data back in order as a table?
Is there a way to extract the results-data from all athletes (SUI, male, YoB 1980-2004) without switching from athlete to athlete?
Thank you very much in advance... looking forward to your answers...
Greetings!!
You wrote that you are not a programmer, so this would be very complicated to explain to you, but I have a solution for you. On any of those two pages, open your browser developer tools with F12 and go to the "console" tab, and then paste this and press Enter:
copy([...$('.thead .container, .table-row .container')].map(e => [...$(e).children(':visible:not(:has(.g-sm-24), :has(.g-xs-24), :has(.pale))'), ...$(e).find(':has(.g-sm-24), :has(.g-xs-24), :has(.pale)').find(':visible.g-sm-24, :visible.g-xs-24, :visible.pale')].map(c => c.innerText.replace(/\n/g, ' ').trim()).join('\t')).join('\r\n'))
This will copy a nice table for you into the clipboard, which you can paste into Excel.
Small caveat: The order of the columns is a bit different (because some of those columns are "special" in that they are actually a group of columns which is shown/hidden together normally depending on whether you are on mobile or not).
By the way, this is stored in a command history then, next time you just need to press the up arrow ↑ key in the command line field to recall the command and press Enter again.

When I upload my excel file to R, the column titles are in the rows and the data seems all jumbled. How do I fix this?

hi literally day one new coder
On the excel sheet, my data looks organized, but when I upload my file to R, it's not able to read the excel properly and the column headers are in the rows and the data seems randomized.
So far I have tried:
library(readxl)
dataset <-read_excel("pathname")
View(dataset)
Also tried:
dataset <-read_excel("pathname", sheet=1, colNames=TRUE)
Also tried to use the package openxlsx
but nothing is giving me the correct, organized data set.
I tried formatting my Excel to a CSV file, and the CSV file looks exactly like the data that shows up on R (both are messed up).
How should I approach this problem?
I deal with importing .xlsx into R frequently. It can be challenging due to the flexibility of the excel platform. I generally use readxl::read_xlsx() to fetch data from .xlsx files. My suggestions:
First, specify exactly the data you want to import with the range argument.
A cell range to read from, as described in cell-specification. Includes typical Excel
ranges like "B3:D87", possibly including the sheet name like "Budget!B2:G14"
Second, if there are there merged cells or other formatting challenges in column headers, I resort to setting col_names = FALSE. And supplying clean names after import with names(df) <- c("first_col", "second_col")
Third, if there are merged cells elsewhere in the spreadsheet I generally I resort to "fixing" them in excel (not ideal but easier for my use case), however, others may have suggestions on a programmatic fix.
It may be helpful to provide a screenshot of your spreadsheet.

loading data and replacement in R

Hi sorry first post here my apologies if I made a mistake.
So I'm fairly new to R and I was given an assignment where I am loading a CSV file into R. When i read.csv the whole file I get a ton of blank spots where values should be. The only info printed out is the N/A in the cells which is actually what I am trying to replace.
So I took a small sample of the file only the first couple rows and the info came up correctly in my read.csv comand. My question is is the layout of the .csv too large to display the original data in my main.csv file?
Also, How would I go about replacing all the N/A and NA's in the file to change them to blank cells or ""
Sorry if i painted my scenario poorly
first make sure that all of you data in the csv file is in GENERAL format!
there should be a title for each of the columns too
if you have an empty cell in your csv file then input a 0 into it
and make sure that around the data you CLEAR ALL the cells around them just incase there is anything funny in them
hope that helps if not then you could send me your file to sgreenaway#vmware.com and i will check it out for you :)

Using R, import data from web

I have just started using R, so this may be a very dumb question. I am trying to import the data using:
emdata=read.csv(file="http://lottery.merseyworld.com/cgi-bin/lottery?days=19&Machine=Z&Ballset=0&order=1&show=1&year=0&display=CSV",header=TRUE)
My problem is that it reads the csv file into a single column ( by the way, the lottery data is simply because it is publicly available to download - using as an exercise to understand what I can and can't do in R), instead of formatting it into however many columns of data there are. Would someone mind helping out, please, even though this is trivial
Hm, that's kind of obnoxious for a page purporting to be in csv format. You can skip the first 5 lines, which will cause R to read (most of) the rest of the file correctly.
emdata=read.csv(file=...., header=TRUE, skip=5)
I got the number of lines to skip by looking at the source. You'll still have to remove the cruft in the middle and end, and then clean up the columns (they'll all be factors because of the embedded text).
It would be much easier to save the page to your hard disk, edit it to remove all the useless bits, then import it.
... to answer your REAL question, yes, you can import data directly from the web. In general, wherever you would read a file, you can substitute a fully qualified URL -- R is smart enough to do the Right Thing[tm]. This specific URL just happens to be particularly messy.
You could read text from the given url, filter out the obnoxious lines and then read the result as CSV like so:
lines <- readLines(url("http://lottery.merseyworld.com/cgi-bin/lottery?days=19&Machine=Z&Ballset=0&order=1&show=1&year=0&display=CSV"))
read.csv(text=lines[grep("([^,]*,){5,}", lines)])
The above regular expression matches any lines containing at least five commas.

X12 seasonal adjustment program from census, problem with input file extensions

I downloaded the X12 seasonal adjustment program located here: http://www.census.gov/srd/www/x12a/x12downv03_pc.html
I followed the setup and got the setting correct. When I go to select a file to input I have four options for file extensions to import which are ".spc" ".mta" ".dta" and "."
The problem is that I have data in excel and I have searched extensively through search engines and I do cannot figure out a way to get data from excel into one of these formats so I can do a seasonal adjustment on my data. Thanks
ADDED: After converting to a dta file (using R thanks to the comments left below) it looks like the program makes you convert it also to a .spc file as well. Anyone have a lead on how to do this? thanks
My first reaction is to:
(1) export the data from excel in something simple like csv.
(2) import that data into R
(3) use the R library "foreign" to export the data in .dta format.
So with the file "test.csv" containing:
V1,V2
1,2
3,4
5,6
you could do the following to produce "test.dta":
library(foreign)
testdata <- read.csv("test.csv")
write.dta(testdata,"test.dta")
Voila, data in .dta format. Would this work for what you have?
I've only ever used the command-line version of X12, but it sounds like you may be using the windows interface instead? If so the following might not be entirely accurate, but it should be close enough (I hope!).
The .dta and .mta files you refer to are just metafiles containing text lists of either spec files or data files to be processed; in particular the .dta files X12 uses are NOT Stata data format files like those produced by Nathan's R-based answer. It's probably best to ignore using metafiles until you are comfortable enough using the software to adjust a single time series.
You can export your data in tab separated variable format (year month/quarter value) without headings and use that as your data file. You can also use a simple list of data values separated by spaces, tabs, or newlines and then tell X12ARIMA what the start and end dates of the series are in the .spc file.
The .spc file doesn't contain the input data, it's a specification file telling X12 where to find the data file and how you want those data to be processed -- you'll have to write them yourself or create them in Win X-12.
Ideally you should write a separate .spc file for each time series to be adjusted; while you can write a .spc file which invokes many of X12's autoselection and identification procedures, it's usually not a good idea to treat the process as a black box, and a bit of manual intervention in the .spc is often necessary to get a good quality adjustment (and essential if there's a seasonal break involved). I find it helpful to start with a fairly generic skeleton .spc file suitable for your computing environment to begin with and then tweak it from there as appropriate for each series.
If you really want to use a single .spc file to adjust multiple series, then you can provide a list of data files in a .dta file and a single .spc file instructing X12ARIMA how to adjust them, but take care to ensure this is appropriate for your data!
The "Getting started with X-12-ARIMA input files on your PC" document on that site is probably a good place to start reading, but you'll probably end up having to consult the complete reference documentation (in particular Chapters 3 and 7) as well.
Edit postscript:
The UK Office for National Statistics have a draft of their guide to seasonal adjustment with X12ARIMA available online here here (archive.org), and is worth a look. It's a good bit easier to work through than the Census Bureau documentation.
Ryan,
This is not elegant, but it might work for you. In this example I'm trying to replicate the spec file from the Example 3.2 in the Census documentation.
Concatentate the data into one text string, then save this single text string using the MS-DOS (TXT) format under the SAVE AS command. To make the text string, first insert two cells above your column header and in the second one type the following text into it.
series{title=
Next, insert double quotation marks before and after the text in your column header, like this:
"Monthly Retail Sales of Household Appliance Stores"
Directly below the last data row, insert rows of texts that list the model specifications, like the following:
)
start= 1972.jul}
transform{function = log}
regression{variables=td}
indentify[diff=(0,1) sdiff=(0,1)}
So you should have something like the following:
<blank row>
series{title=
"Monthly Retail Sales of Household Appliance Stores"
530
529
...
592
590
start= 1972.jul}
transform{function = log}
regression{variables=td}
indentify{diff=(0,1) sdiff=(0,1)}
For the next instructions I am assuming that the text *series{title=
* appears in cell A2 and that cell B1 is empty. In cell B2, insert the following:
=CONCATENATE(B1,A2," ")
Then copy this formula into every cell down the column to concatentate all of the text in column A into a single cell at the end of column B. Finally, copy the final cell to a new spreadsheet's cell A1 using PASTE SPECIAL/VALUE, and save this spreadsheet using SAVE AS: *TXT(MS-DOS), but change the extension to ".spc".
Good luck (and from the little I read of the Census documentation - you'll need it).

Resources