Dataframe gets cut before an empty row in an excelfile - julia

I have an excel file which has 79 rows and 12 columns.
The problem I have is that when I run the following code, I only get 8 rows and 1 column.
Row 9, is an empty row, and I think this is the problem. But I don't know how to get by this problem. I have tried using missing, ismissing, allowmissing and dropmissing in different ways. But nothing I've tried has worked for me.
using DataFrames, XLSX
df = DataFrame(XLSX.readtable("file.xlsx")...)
println(df)
Is the empty row a problem, and how can I handle that empty row? Or is there something else that might cause the problem that I only get 8 rows and 1 column?

From the XLSX.jl documentation:
stop_in_empty_row is a boolean indicating wether an empty row marks the end of the
table. If stop_in_empty_row=false, the TableRowIterator will continue to fetch rows
until there's no more rows in the Worksheet. The default behavior is stop_in_empty_row=true.
so set stop_in_empty_row=false to read in empty rows as missing

Related

How do I add a column with a different number of rows?

I am calculating returns in R, and trying to add it to the current dataframe I am working with, but it doesnt work due to a difference in rows, where as existing rows are 194, and assigned data has 193 rows.
This code works just fine when doing it on its own:
diff(log(capm$price_Ford))
But when I try to assign it into the dataframe as its own column, I get the an error
capm$ford_ret <- diff(log(capm$price_Ford))
How can I assign the data with 193 rows, to a dataframe with 194 rows?
How can I assign the data with 193 rows, to a dataframe with 194 rows?
In a nutshell, you can’t. Each column in a table must have the same number of rows. You need to decide what to fill into the row that’s missing a value. Depending on your use-case, this might for example be 0 or NA. You also need to decide whether the missing value should go at the beginning or at the end (for a difference, usually at the beginning). For example:
capm$ford_ret <- c(NA, diff(log(capm$price_Ford)))

Finding whether the same value is selected in each column for any individual row

I'm trying to see if any person endorsed the same response option for each item on a scale. If the person did do that, I want to create a new variable in the dataset that has a 1 in their row, and a 0 if they didn't. There are 4 rows of NAs in the columns I'm using, and I don't want to delete the rows.
I've tried ifelse(rowVars(dro[cols]) == 0, 1, 0) but the 4 NAs get removed, so I can't place the result back into the original dataset.
I've tried rowMins and rowMaxs, which end up somehow shifting the NAs 4 rows up. I've also tried apply() which does the same thing as rowMins.
If anyone has any advice on how to proceed, I would be very appreciative.

How to add a record to a dataframe using rbind in R

I am trying to add rows to a dataframe using rbind in R. However, the dataframe is not being updated each time I attempt to add a row. In other words, the following code results in a dataframe with 2 columns but 0 observations when it should have 2 columns with 2 observations.
modeldata2<-data.frame(Model=character(),Accuracy=numeric())
modelname<-"A"
accuracystr2<-2.2
rbind(modeldata2,list(modelname,accuracystr2))
modelname<-"B"
accuracystr2<-3.2
rbind(modeldata2,list(modelname,accuracystr2))
I am using this at the end of a loop to record values and therefore need to first initialize an empty dataframe and then add records to the dataframe at the end of each loop. The code above is just an example that I am using to troubleshoot the problem. I have also tried using c instead of list but the result was the same.

R: stacking up values from rows of a data frame

I started programming in R yesterday (literally), and I am having the following issue:
-I have a data frame containing R rows, and each row contains N values.
Rows are identified by the first and second field, while the other N-2 are just numerical values or NA.
-Some rows have identical first field and identical second field, something like:
row 1: a,b, third_field, .. ,last_field
row 2: a,b, third_field, .. ,last_field
the rule is that usually the first line will have its fields containing some numbers and some NA, while the second row will contain NA and numbers as well, but differently distributed.
What I am trying to do is to merge the two rows (or records) according to these two rules:
1) if both rows have a NA on a given field, I keep NA
2) if one of the two has a number, I use that value; if both of the rows contain the same value, I keep it also.
How do you do this without looping on each field of each row? (1M rows, tenths of fields, it will finish maybe tomorrow).
I do not know how to better explain my problem. I am sorry for the lengthy explaination, thanks a lot.
EDIT: it is better if I add an example. The following two lines
a,b,NA,NA,NA,1,2 ,NA
a,b,NA,3 ,NA,1,NA,NA
should become
a,b,NA,3 ,NA,1,2 ,NA

Missing values when excluding rows

I have a data frame of about 10,000,000 entries. There's only two columns: 'value' and 'deleted'. The values usually range from 1:1800, but also there's some odd strings. Deleted is a boolean indicating whether the value was deleted. If I copy this data frame with the condition
deletedFrame <- df[df$deleted!=0, ]
the resulting data frame reduces to 283 entries. However, it doesn't copy over any of the corresponding values. That column is there but is left blank. Any ideas on what I'm doing wrong?
It could be a case where we have NA along with the boolean, one way would be to use
df[df$deleted!=0 & !is.na(df$deleted), ]

Resources