I am reading data from a simple spreadsheet but getCell() returns all the data from a row and not just the indicated cell.
For example, I have a spreadsheet with random text in the first few cells.
__A__ __B__ __C__ __D__ __E__
1 | aa | bb | cc | dd | ee |
2 |_____|_____|_____|_____|_____|
I fire up PHPExcel and run this code:
$objReader = PHPExcel_IOFactory::createReaderForFile($file);
$objPHPExcel = $objReader->load($file);
$foo = $objPHPExcel->getActiveSheet()->getCellByColumnAndRow('A', 1)->getValue();
$bar = $objPHPExcel->getActiveSheet()->getCell('A1')->getValue();
The result for both $foo and $bar is one string of all the values:
aa bb cc dd ee
Why is this happening? Thanks.
PHP v5.3.13
PHPExcel v1.7.9
Check the file in a text editor. You normally only get this if the file is a csv file that doesnt use the default separator (,)
But the first argument to getCellByColumnAndRow() should be a numeric value for the column
Instead of passing 'A', 1 as arguments to getCellByColumnAndRow, try passing 0, 1 instead. That should correctly target the data you are looking for.
I got a response from PHPExel's support team. The XLS document in question is from a third-party. When I ran PHPExcel_IOFactory::identify($file) the result was 'CSV'. Obviously the document has formatting issues. That does explain the problem, though, since the cells could not be properly interpreted.
The response: https://phpexcel.codeplex.com/discussions/463319#post1111959
Related
From an API I get a Base64 encoded dataset. I use RCurl::base64 to decode it, but it's serialized. How do I convert it to a dataframe?
After decoding my return, I get a long textstring with semi colon separated data and column names. Looks like this:
[1] "\"lfdn\";\"pseudonym\";\"external_lfdn\";\"tester\"\r\n\"50\";\"434444345\";\"0\";\"0\"\r\n\"91\";\"454444748\";\"0\";\"0\"\r\n\
You can see the structure with a simple cat(x):
"lfdn";"pseudonym";"external_lfdn";"tester"
"50";"434444345";"0";"0"
"91";"454444748";"0";"0"
"111";"444444141";"0";"0"
I've tried the obvious unserialize(x), but I get:
R> Error in unserialize(enc) :
R> character vectors are no longer accepted by unserialize()
Whatever I throw at it... I can write the object to disk, and read it back in, but I prefer to avoid that.
Getting the data from the serialized textstring into a dataframe with column names would be great!
This should do the trick:
read.table(text=j, header = TRUE, sep = ";")
# lfdn pseudonym external_lfdn tester
# 1 50 434444345 0 0
# 2 91 454444748 0 0
Note. I copied your string from above, it does not contain the last row with 111 in it.
I have a very large csv file (1.4 million rows). It is supposed to have 22 fields and 21 commas in each row. It was created by taking quarterly text files and compiling them into one large text file so that I could import into SQL. In the past, one field was not in the file. I don't have the time to go row by row and check for this.
In R, is there a way to verify that each row has 22 fields or 21 commas? Below is a small sample data set. The possibly missing field is the 0 in the 10th slot.
32,01,01,01,01,01,000000,123,456,0,132,345,456,456,789,235,256,88,4,1,2,1
32,01,01,01,01,01,000001,123,456,0,132,345,456,456,789,235,256,88,5,1,2,1
you can use the base R function count.fields to do this:
count.fields(tmp, sep=",")
[1] 22 22
The input for this function is the name of a file or a connection. Below, I supplied a textConnection. For large files, you would probably want to feed this into table:
table(count.fields(tmp, sep=","))
Note that this can also be used to count the number of rows in a file using length, similar to the output of wc -l in the *nix OSs.
data
tmp <- textConnection(
"32,01,01,01,01,01,000000,123,456,0,132,345,456,456,789,235,256,88,4,1,2,1
32,01,01,01,01,01,000001,123,456,0,132,345,456,456,789,235,256,88,5,1,2,1"
)
Assuming df is your dataframe
apply(df, 1, length)
This will give you the length of each row.
Sample input tab-delimited text file, note there is bad data from this source file, the enclosing " at end of line 3 is two lines down. So there is 1 complete blank line, followed by a line with just the double-quote character, then continued good data on the next line.
id ca cb cc cd
1 hi bye hey nope
2 ab cd ef "quoted text here"
3 gh ij kl "quoted text but end quote is 2 lines down
"
4 mn op qr lalalala
when I read this into R, tried using read.csv and fread, with/without 'blank.lines.skip = T' for fread, I get the following data table:
id ca cb cc cd
1 1 hi bye hey nope
2 2 ab cd ef quoted text here
3 3 gh ij kl quoted text but end quote is 2 lines down
4 4 mn op qr lalalala
The data table does not show the 'bad' lines. OK, good! However, when I go to write out this data table, tried both write.table and fwrite, those 2 bad lines of /nothing/, and the double-quote, are written out just like they show in the input file!
I've tried doing:
dt[complete.cases(dt),],
dt[!apply(dt == "", 1, all),]
to clear out empty data before writing out, but it does nothing. The data table still only shows those 4 entries. Where is R keeping this 'missing' data? How can I clear out that bad data?
I hope this is a 'one-off' bad output from the source (good ol' US Govt!), but I think they saved this from an xls file, which had bad formatting in a column, causing the text file to contain this mistake, but they obviously did not check the output.
After sitting back and thinking through the reading functions, because that column (cd) data is quoted, there's actually two newline characters at the end of the string, which is not shown in the data table element! So writing out that element will result in writing those two line breaks.
All I needed to do was:
dt$cd <- gsub("[\r\n","",dt$cd)
and that fixed it, the output written to file now has correct rows of data.
I wish I could remove my question...but maybe someday someone will come across the same "issue". I should have stepped back and thought about it before posting the question.
I have a csv file which looks like this-
#this is a dataset
#this contains rows and columns
ID value1 value2 value3
AA 5 6 5
BB 8 2 9
CC 3 5 2
I want read the csv file excluding those comment lines. It is possible to read mentioning that when it is '#' skip those line.But here the problem is there is an empty line after comments and also for my different csv file it can be various numbers of comment lines.But the main header will always start with "ID" from where i want to read the csv.
It is possible to specify somehow that when it is ID read from there? if yes then please give an example.
Thanks in advance!!
Use the comment.char option:
read.delim('filename', comment.char = '#')
Empty lines will be skipped automatically by default (blank.lines.skip = TRUE). You can also specify a fixed number of lines to skip via skip = number. However, it’s not possible to specify that it should start reading at a given line starting with 'ID' (but like I’ve said it’s not necessary here).
For those looking for a tidyverse approach, this will make the job, similarly as in #Konrad Rudolph's answer:
readr::read_delim('filename', comment = '#')
If you know in advance the number of line beofre headers, you can use skip option (here 3 lines):
read.table("myfile.csv",skip=3, header=T)
I am new to R and my question should be trivial. I need to create a word cloud from a txt file containing the words and their occurrence number. For that purposes I am using the snippets package.
As it can be seen at the bottom of the link, first I have to create a vector (is that right that words is a vector?) like bellow.
> words <- c(apple=10, pie=14, orange=5, fruit=4)
My problem is to do the same thing but create the vector from a file which would contain words and their occurrence number. I would be very happy if you could give me some hints.
Moreover, to understand the format of the file to be inserted I write the vector words to a file.
> write(words, file="words.txt")
However, the file words.txt contains only the values but not the names(apple, pie etc.).
$ cat words.txt
10 14 5 4
Thanks.
words is a named vector, the distinction is important in the context of the cloud() function if I read the help correctly.
Write the data out correctly to a file:
write.table(words, file = "words.txt")
Create your word occurrence file like the txt file created. When you read it back in to R, you need to do a little manipulation:
> newWords <- read.table("words.txt", header = TRUE)
> newWords
x
apple 10
pie 14
orange 5
fruit 4
> words <- newWords[,1]
> names(words) <- rownames(newWords)
> words
apple pie orange fruit
10 14 5 4
What we are doing here is reading the file into newWords, the subsetting it to take the one and only column (variable), which we store in words. The last step is to take the row names from the file read in and apply them as the "names" on the words vector. We do the last step using the names() function.
Yes, 'vector' is the proper term.
EDIT:
A better method than write.table would be to use save() and load():
save(words. file="svwrd.rda")
load(file="svwrd.rda")
The save/load combo preserved all the structure rather than doing coercion. The write.table followed by names()<- is kind of a hassle as you can see in both Gavin's answer here and my answer on rhelp.
Initial answer:
Suggest you use as.data.frame to coerce to a dataframe an then write.table() to write to a file.
write.table(as.data.frame(words), file="savew.txt")
saved <- read.table(file="savew.txt")
saved
words
apple 10
pie 14
orange 5
fruit 4