R - Can I have a matrix with different number of columns for rows? - r

This might be a stupid question. I have some 'NA' in a matrix, I need to put this matrix into jags model, but I want to remove those NA. Can I remove only NA but keep the rest of the data?
My data looked like the picture below. Can I have rows with different column numbers?

You cannot.
You need to impute these missing values or remote either the column or the row entirely.
Imputing missing values is as complicated as you want it to be. You'd be best of looking into the first few google searches on the topic or just using the mean value of the column.

Related

Removing NA values from rows, without removing the rows in R

I have a dataset in which the data would look something like this:
a fragment dataframe with the data
So lot's of NA's per row, but also regular answers, that I want in the final version.
Is it possible to remove the NAs, but without removing the rows as a whole?
I thought about pivoting and removing rows with NA, but then it would just remove the occurences that have actual answers as well.
The data is coming form a decision making procedure in qualtrics, in which not every option is displayed to the participants (hence the NAs), but we do not want to exclude people in any step. I also thought about maybe recoding the values, and subsetting them somehow, but that doesn't seem to work out right in my mind when it comes to the actual analysis.
I tried removing the NAs, as well as pivoting the table and removing them later.
I do not yet have the full dataset, but want to experiment on strategies of data analysis before I have the data collected, to not get lost once I have it.

Subsetting rows, changing values, and placing them back into matrix?

I hope this has not been answered, but when I search for a solution to my problem I am not getting any results.
I have a data.frame of 2000+ observations and 20+ columns. Each row represents a different observation and each column represents a different facet of data for that observation. My objective is to iterate through the data.frames and select observations which match criteria (eg. I am trying to pick out observations that are in certain states). After this, I need to subtract or add time to convert it to its appropriate time zone (all of the times are in CST). What I have so far is an exorbitant amount of subsetting commands that pick out the rows that are of the state being checked against. When I try to write a for loop I can only get one value returned, not the whole row.
I was wondering if anyone had any suggestions or knew of any functions that could help me. I've tried just about everything, but I really don't want to have to go through each state of observations and modify the time. I would prefer a loop that could easily go through the data, select rows based on their state, subtract or add time, and then place the row back into its original data.frame (replacing the old value).
I appreciate any help.

Comparing dates in a cell

I am trying to do a sumifs in Google Sheets that sums based on a number of variables held in cells. I want to be able to vary the dates in two cells to change the range that is summed. My formula looks like:
=SUMIFS(D2:D500,A2:A500,">8/01/15",A2:A500,"<9/01/15",F2:F500,C1012)
I want to be able to replace the two dates with cells. When I do, I get a formula parse error. I have seen a lot of questions about doing this for formatting, but not in this context.
Can anyone help?
Assuming your dates are in I1 and J1 please try:
=SUMIFS(D2:D500,A2:A500,">"&I1,A2:A500,"<"&J1,F2:F500,C1012)

Making a data frame of only outliers of a large data set

Instead of trying to remove outliers from a data set, I am trying to create a new data frame consisting only of the rows tha have outliers in them.
I was able to column-bind the averages and standard deviations of the different groups onto the end of the data set. Now, I have tried this code to produce a table of outlier data:
Outliers <- Sample[((Sample$x - Sample$Averages)/Sample$StDevs) > 2.00,]
This process runs, but produces an empty table for Outliers. I tested some individual values from the data to make sure outliers existed, and they do. If I specify a row, the above calculation indeed produces a Boolean argument. It is when I try to collect these outliers in a table that I have problems. I also tried initializing Outliers as a data.frame or data.table, but was unsuccessful here as well (probably just because I am new to R).
ex:
When I run
((Sample$x[3] - Sample$Averages[3])/Sample$StDevs[3]) > 2
it returns TRUE. This is good. Why, then, do I get an empty table of outliers when I simply want to KEEP everything in Sample where this condition is true? I do not feel that this should be a difficult problem, but I cannot for the life of me get it to work.
Any suggestions? Thanks in advance!
Sample[ 0, ] should get you an empty dataframe with no rows and the same column names.

Column means over a finite range of rows

I am working with climate data in New Mexico and I am an R novice. I am trying to replace NA with means but there are 37 different sites in my df. I want the means of the column for which the DF$STATION.NAME (in column 1) is unique. I cant be using data from one location to find the mean of another... obviously. so really I should have a mean for each month, for each station.
My data is organized by station.name vertically in column 1 and readings for months jan-dec in columns following, including a total column at the end (right). readings or observations are for each station for each month, over several years (station name listed in new row for each new year.)
I need to replace the NAs with the sums of the CLDD for the given month within the given station.name, how do I do this?
Try asking that question on https://stats.stackexchange.com/ (as suggested by the statistics tag), there are probably more R users there than on the general programming site. I also added the r tag to your question.
There is nothing wrong with splitting your data into station-month subsets, filling the missing values there, then reassembling them into one big matrix!
See also:
Replace mean or mode for missing values in R
Note that the common practice of filling missing values with means, medians or modes is popular, but may dilute your results since this will obviously reduce variance. Unless you have a strong physical argument why and how the missing values can be interpolated, it would be more elegant if you could find a way that can deal with missing values directly.

Resources