How to shift rows down based on value in column R? - r

I have a data frame that looks kind of like this...but much larger
I want to look at the record_id column and shift the right side columns down when the row says admin_time. Then make that previous row NA. Then when I write it to a csv, I'll just use the na = "" to make those cells blank
For example, in the first few rows, it would look like this...
No need to try to recreate the data frame. I was thinking maybe a for loop would work with an embedded if statement to review the patient_id, record_id, and pk_day. I was just looking for alternate suggestions or how to use a statement within the loop to pick out the admin line and do what I mentioned above

Related

How to skip empty rows while reading multiple tabs in R?

I am trying to read an excel file with multiple tabs. For that, I use the code provided here.
The problem is that each tab has a different number of empty rows before the actual data begins. For example, the first tab has two empty rows, the second tab has three empty rows, and so on.
Normally, I would use the parameter skip in the read_excel function to indicate the number of empty lines to skip. But how do I do that for multiple tabs with different numbers of rows to skip?
perhaps the easiest solution would be to read it as it is then remove rows, i.e. yourdata <- yourdata[!is.na(yourdata$columname),] ; this would work if you don't expect any NA's in a particular column, like id. If you have data gaps everywhere you can test for all NAs in multiple columns - let me know if that's what you need.

Adding new column with data value in R

forest area to the I want to add a column name (say ForestAreaPerPopn) to find the ratio of forest area to the population(represented by variable Total below) residing. The data contains the following variables and their values.
How can I add a column named ForestAreaPerPopn in Table****ForestAreaPerPop (shown below) so that the column contains the data calculated as ratio of forest area to Total.
Too long for a comment.
You have a couple of problems. First, your column names have spaces and other special characters. This is allowed but creates all kinds of problems later. I suggest you do something like:
colnames(ForestAreaPerPop) <- gsub(' |\\(|\\)', '_', colnames(ForestAreaPerPop))
This will replaces any spaces, left or right parens in the colnames with '_'.
Then, something like:
ForestAreaPerPop$n <- with(ForestAreaPerPop, Forest_Area_in_ha/Total)
should give you what you want.
Some advice: long table names and column names may seem like a good idea, but you will live to regret it. Make them short but meaningful (easier said than done).

Referencing last used row in a data frame

I couldn't find the answer in any previously asked questions, but I believe this is an easy one.
I have the below two lines of code, which take in data from excel in a specific range (using readxl for this). The range itself only goes through row 2589 in the excel document, but it will update dynamically (it's a time series) and to ensure I capture the different observations (rows) as they're added, I've included rows to 10000 in the read_excel range argument.
In the end, I'd like to run charts on this data, but a key part of this is identifying the last used row, without manually updating the code row for the latest date. I've tried using nrow but to no avail.
Raw_Index_History <- read_excel("RData.xlsx", range = "ReturnsA6:P10000", col_names = TRUE)
Raw_Index_History <- Raw_Index_History[nrow(Raw_Index_History),]
Does anybody have any thoughts or advice? Thanks very much.
It would be easier to answer your question if you include an example.
Not knowing how your data looks like answers are likely going to be a bit vague.
Does your data contain NAs? If not it should be straight forward to remove the empty rows with
na.omit(Raw_Index_History)
It appears you also have control over the excel spreadsheet. So in case your data does contain NAs you could have some default value in your empty rows that will get overwritten as soon as a new data point is recorded. This will allow you to filter your dataframe accordingly.
Raw_Index_History[!grepl("place_holder", Raw_Index_History$column_with_placeholder),]
If you expect data in the spreadsheet to grow, you can specify only the columns to include, instead of a defined boundary.
Something like this ...
Raw_Index_History <- read_excel("RData.xlsx",
sheet = 1,
range = cell_cols("A:P"), # Only cols, no rows
col_names = TRUE)
Every time you run the code, R will pull in the data from columns between A:P up until the last populated row.
This will be a more elegant approach to your use case. (Consider what you'd do when your data crosses 10000 rows in the future)

R - conditionally modify a dataframe

I want to check to see if a dataframe has a certain number of columns, and then conditionally modify the dataframe based on the number of columns it has.
If I try
ifelse(ncol(Table1)=="desired number of columns",Table1[,ColumnsSelected],Table1[,ColumnsSelected2]))
I get Table 1 back, but only one column, and it looks weird. Is there a way that I can change this to make it so that I can return a dataframe with the desired columns based on the number of columns in the dataframe?
I have roughly 700 tables that have 2 types of formatting that I would prefer not to have to individually scan to reformat.
Please advise

Free text in a dataframe merging and making it in continuous flow for every row

In a dataframe I have 2 columns with free text and I would like to merge them to one. I use cbind(df$col1,df#col2) or rbind(df$col1,df#col2) and I take a merged df but with number.
What can I do to merge them?
Also in the text there are many spaces like this:
something here
another line here
and we are here
but we also have this
and I would like to make it to be a continuous from like this:
something here another line here and we are here but we also have this
How can I make it?

Resources