Replicating rows in a data frame using information in other columns - r

I have the following data frame NewTests:
The problem: I want to replicate the rows in the data frame using the information in 'duration' column. Although in this case the replication factor is 1 but it can be anything 2,3,4, etc. And in the replicate rows I want to add a new column called as 'Date' which contains information from the PromotionStartDate and PromotionEndDate columns.
E.g. in this case, the Date Column should contain 2017-04-01 for the entries shown. But in another case where duration is 2 and the PromotionStartDatae= 2017-03-01 and PromotionEndDate=2017-05-01, the replicated row 1 should contain 2017-03-01 in Date column and the replicated row 2 should contain 2017-04-01 in Date column.
I am trying to use the following solution to work out my problem:
library(splitstackshape)
newConrtols=expandRows(NewTests,"duration",drop=FALSE)%>%
group_by(CustomerNumber,PromotionID,RewardAssigned,RunID,ModelID)%>%
mutate(Date=seq(as.Date(PromotionStartDate),as.Date(PromotionEndDate),by="month")[1:duration])
But this gives the error:
Error in mutate_impl(.data, dots) : 'from' must be of length 1
What am I doing wrong in the solution?

That is quite simple, you just select the right row by using the following example.
Lets say I have a dataframe just like you.
a=c(1,2,3,4)
b=c(a,b,c,d)
t=data.frame(a,b)
Now, if I want the second column, I would normally type
t[2,]
Now that I want the second row I will type
t[,2]
IF I WANT TO copy that in the fifth row I would do
t[,5]=t[,2]
In your case if wanted to copy the dates from column PromotionDate to FinalDate you could right this line:your_variable_with_the_dataframe[6,]=your_variable_with_the_dataframe[7,]

Related

How do I gather data that is spread across in various rows to a single row?

I have a dataframe that has 23 columns of various parameters defining a patient which I extracted using dplyr from a larger dataframe after pivoting it such that each of the parameters forms the columns of the new dataframe.
Now I am facing an issue. I am getting a lot of rows for the same patient. For each parameter, one of the rows shows the required value and the rest is denoted as NA. So if the same patient is repeated, say 10 times, in every parameter column there is one row with the actual value and the rest is NA.
How do I remove these NAs and gather the information that is scattered in this manner?
I want the 1 and 2 to be on the same row. All the rows seen in this image of dataframe are of the same person.

Filling in values in a blank data frame

I have a data frame with a number of columns I read in, and now I want to add certain pieces only to certain columns.
For example, the variable periodicnumber exists in the dataframe called df and I want to give the first six rows the values 1 through 6. I thought code below would work but I get the error:
periodicnumber=seq(1,6)
df$periodicnumber=periodicnumber
Error in `$<-.data.frame`(`*tmp*`, "periodicnumber", value = 1:6) :
replacement has 6 rows, data has 0
As in, were this in Excel, I would write the numbers 1 through 6 only on the periodicnumber column.
If you only want to change the first six rows of df, you need to specify that in the assignment:
periodicnumber=seq(1,6)
df$periodicnumber[1:6]<-periodicnumber
More generally:
df$column[1:len(x)]<-x

R: Replace column names with row values except where cells equal NA

I have a data frame extracted from a data base that contains different types of data (record types). The different record types have different column names which occupy the first three rows (including header). This data frame is made to be used in excel where you can easily filter out the data by choosing the correct record type.
Here I present small sample of my data frame which in reality contains many more columns (59) as well as rows (34000).
sample <- data.frame(X01RecordType=c("01HL","01CA","HH","HH","HH","HL"), X02Quarter=c(NA,NA,2,2,2,1),X05Gear=c(NA,NA,"KRA","KRA","KRA",NA),X06SweepLngt=c(NA,NA,35,35,-9,-9),
X12Month=c("12SpecCodeType",NA,4,5,4,2), X13Day=c("13SpecCode",NA,26,5,25,160617), X22StatRec=c("22LngtCode","22CANoAtLngt","45G1",NA,NA,NA),X23Depth=c("23LngtClass","23IndWgt",41,NA,63,NA))
As you might see the cells which contain column names are preceded by an X and a number and then a text, e.g. X01RecordType. It would be very easy to replace column names with the first rows by using:
colnames(df) <- df[1,]
However, as you can see some of the cells in the first two rows also contain NA-values. These NA-values indicate that the column names are the same for all record types, using the current header and therefore I would like to keep these. So really what I would like to do is replace the column names with the values of the first row (where record type header equals 01HL) except for NA-values.
If possible I would like to do this without using any external packages. Cells within the data may also contain NA-values and I would like to keep these rows so filtering out all columns containing NA is not an option if it doesn't only apply to the first row. Which is really the way I tried to approach this problem, but I can't figure out how.
I hope this is all the information required to help me out and thanks!
Another option without a loop
colnames(sample)[!is.na(sample[1,])] <- sample[1,][!is.na(sample[1,])]
sample[1:2,]
# 01HL X02Quarter X05Gear X06SweepLngt 12SpecCodeType 13SpecCode 22LngtCode
#1 01HL NA <NA> NA 12SpecCodeType 13SpecCode 22LngtCode
#2 01CA NA <NA> NA <NA> <NA> 22CANoAtLngt
# 23LngtClass
#1 23LngtClass
#2 23IndWgt
I suggest a simple loop:
for(c in 1:length(sample)) if(!is.na(sample[1,c])) colnames(sample)[c] = as.character(sample[1,c])

R - How to get value from a column based on value from another column of same row

I have a data frame (df) with 8 columns and 1200 rows. Among those 8 columns I want to find the minimum value of column 7 and find the corresponding value of column 2 in that particular row where the minimum value of column 7 was found. Also column 2 holds characters so I want a character vector giving me its value.
I found the minimum of column 7 using
min_val <- min(as.numeric(df[, 7]), na.rm = TRUE)
Now how do I get the value from column 2 (variable name of column being 'column.2') corresponding to the row in which column 7 contains value of 'min_val' as calculated above?
This might be a trivial question but I am new to R so any help will be much appreciated.
Use which.min to get the minimum value index. Something like :
df[which.min(df[,7]),2]
Note that which.min only returns the first index of the minimum, so if you've got several rows with the same minimal value, you will only get the first one.
If you want to get all the minimum rows, you can use :
df[which(df[,7]==min(df[,7])), 2]
The same answer from juba, but using data.table package (his answer uses just the R base, without the need of loading any libraries).
# Load data.table
library(data.table)
# Get 2nd column's value correspondent to the first minimum value in 7th column
df[which.min(V7), V2]
# Get all respective values in 2nd column correspondent to the minimum value in 7th column
df[V2 == min(V7), V2]
For handling data.frame-like objects, data.table is quite handly and helpful, just like the dplyr package. It's worth to look at them.
Here I've assumed your colnames were named as V1..V8. Otherwise, just replace the V7/V2 with the respective column names in 7th and 2nd position of your data, respectively.

extract columns that don't have a header or name in R

I need to extract the columns from a dataset without header names.
I have a ~10000 x 3 data set and I need to plot the first column against the second two.
I know how to do it when the columns have names ~ plot(data$V1, data$V2) but in this case they do not. How do I access each column individually when they do not have names?
Thanks
Why not give them sensible names?
names(data)=c("This","That","Other")
plot(data$This,data$That)
That's a better solution than using the column number, since names are meaningful and if your data changes to have a different number of columns your code may break in several places. Give your data the correct names and as long as you always refer to data$This then your code will work.
I usually select columns by their position in the matrix/data frame.
e.g.
dataset[,4] to select the 4th column.
The 1st number in brackets refers to rows, the second to columns. Here, I didn't use a "1st number" so all rows of column 4 are selected, i.e., the whole column.
This is easy to remember since it stems from matrix calculations. E.g., a 4x3 dimensional matrix has 4 rows and 3 columns. Thus when I want to select the 1st row of the third column, I could do something like matrix[1,3]

Resources