Filling in values in a blank data frame - r

I have a data frame with a number of columns I read in, and now I want to add certain pieces only to certain columns.
For example, the variable periodicnumber exists in the dataframe called df and I want to give the first six rows the values 1 through 6. I thought code below would work but I get the error:
periodicnumber=seq(1,6)
df$periodicnumber=periodicnumber
Error in `$<-.data.frame`(`*tmp*`, "periodicnumber", value = 1:6) :
replacement has 6 rows, data has 0
As in, were this in Excel, I would write the numbers 1 through 6 only on the periodicnumber column.

If you only want to change the first six rows of df, you need to specify that in the assignment:
periodicnumber=seq(1,6)
df$periodicnumber[1:6]<-periodicnumber
More generally:
df$column[1:len(x)]<-x

Related

How to split rows within a dataframe for a target column with multiple/nested values

With a dataframe that has, for example, one column x that has nested or multiple values for some rows, how would i, for those rows that have multiple values for x, append duplicate rows to the dataframe, save that that they correspond to one value within x.
To try to explain better, see "mock dataframe pre-transform", below. Row 1 has values "webui, cli, mobile" for column "module", and what i want is to append three near copies of row 1 to the dataframe, one with module value "webui", one with module value "cli" and one with module value "mobile". I also then want to remove the the original row 1. A similar operation would occur for row 4, such that the final dataframe would have 7 rows (see "mock dataframe post-transform, below).
mock dataframe pre-transform
mock dataframe post-transform

How do I edit my data frame (multiply columns) in R?

I have a CSV file with 11 columns (but the first 8 I am ignoring for now), last 3 (9 - 11) are important. I am missing some data for column 9, and these cells show up as NA. But to fill in these cells, I can multiply column 11 by column 10.
I want to create a data frame where all of column 9 is filled in and save that as a new CSV file. I first tried to multiply the columns. This worked and I got the missing data from column 9. Then I tried to merge the new column 9 with the column 9 from my data frame but R just attached the 2 columns together.
I would like for the NA data that has been calculated to replace the data in the original data frame (so I end up with a full column 9). Plus, I would like to only multiply the columns with NA cells so that no original data is replaced. How to do that?
col_9 <- matrix(dat[,10] * dat[,11], ncol=1)
print(col_9)
You can use ifelse function:
dat[,9]=ifelse(is.na(dat[,9]), dat[,10]*dat[,11], dat[,9])
If the condition is TRUE (i.e. is.na(dat[,9])), the value will be replaced by the second argument (dat[,10]*dat[,11]), otherwise it is replaced by the third on (i.e. dat[,9], so the value is kept).

Update a data frame within a for loop

The point of this question is that I want to know how to update a dataframe inside of either a for loop or a function. So i know there are other ways to do the specific task i am looking at, but i want to know how to do it the way i am trying to do it.
I have a data frame with 15 columns and 2k observations with some 98 and 99s. For each row in where there is a 98 or 99 for any variable/column, I want to remove the whole row. I create a function to filter by variable name not equal to 98/99, and use lapply. however, instead of continually updating the data frame, It just spits out a series of data frames, overwriting the previous data frame, meaning that at the end i will only get a data frame with the last column cleaned. How do i get it to update the data frame for each column sequentially?
nafunction = function(variable){
kuwait5=kuwait5%>%
filter(variable<90)
}
`nafunction = function(variable){
kuwait5=kuwait5%>%
filter(variable<90)
}
lapply(kuwait5, nafunction)`
Expected result is a new data frame with all rows that have an 98 removed. What i get is a sequence of data frames each one having ONE column in which rows with NAS are removed.

Replicating rows in a data frame using information in other columns

I have the following data frame NewTests:
The problem: I want to replicate the rows in the data frame using the information in 'duration' column. Although in this case the replication factor is 1 but it can be anything 2,3,4, etc. And in the replicate rows I want to add a new column called as 'Date' which contains information from the PromotionStartDate and PromotionEndDate columns.
E.g. in this case, the Date Column should contain 2017-04-01 for the entries shown. But in another case where duration is 2 and the PromotionStartDatae= 2017-03-01 and PromotionEndDate=2017-05-01, the replicated row 1 should contain 2017-03-01 in Date column and the replicated row 2 should contain 2017-04-01 in Date column.
I am trying to use the following solution to work out my problem:
library(splitstackshape)
newConrtols=expandRows(NewTests,"duration",drop=FALSE)%>%
group_by(CustomerNumber,PromotionID,RewardAssigned,RunID,ModelID)%>%
mutate(Date=seq(as.Date(PromotionStartDate),as.Date(PromotionEndDate),by="month")[1:duration])
But this gives the error:
Error in mutate_impl(.data, dots) : 'from' must be of length 1
What am I doing wrong in the solution?
That is quite simple, you just select the right row by using the following example.
Lets say I have a dataframe just like you.
a=c(1,2,3,4)
b=c(a,b,c,d)
t=data.frame(a,b)
Now, if I want the second column, I would normally type
t[2,]
Now that I want the second row I will type
t[,2]
IF I WANT TO copy that in the fifth row I would do
t[,5]=t[,2]
In your case if wanted to copy the dates from column PromotionDate to FinalDate you could right this line:your_variable_with_the_dataframe[6,]=your_variable_with_the_dataframe[7,]

Naming the number of the row in a data frame that contains a certain value

I've done some thorough research and I am struggling with an attempt to find a function that will name the number of the row (in my data frame the rows don't contain numbers) that contains a certain value. In this case a number.
e.g. Call the data frame = df
I don't know how to show a little image of the data frame but say that in row 5, column 4 the value was '162', is there a function I could use that will end with the return being '5' or 'row 5'?
I have used rowsums(df=="162")
which gives a long line of the rows, if they contain the values there is a '1' under them, if not a '0' but I need a function that simply states the row.
I couldn't figure out how to correctly use the 'which' function either.
which(df$col4=='162')
I am assuming that col4 is the name of the column number 4

Resources