The point of this question is that I want to know how to update a dataframe inside of either a for loop or a function. So i know there are other ways to do the specific task i am looking at, but i want to know how to do it the way i am trying to do it.
I have a data frame with 15 columns and 2k observations with some 98 and 99s. For each row in where there is a 98 or 99 for any variable/column, I want to remove the whole row. I create a function to filter by variable name not equal to 98/99, and use lapply. however, instead of continually updating the data frame, It just spits out a series of data frames, overwriting the previous data frame, meaning that at the end i will only get a data frame with the last column cleaned. How do i get it to update the data frame for each column sequentially?
nafunction = function(variable){
kuwait5=kuwait5%>%
filter(variable<90)
}
`nafunction = function(variable){
kuwait5=kuwait5%>%
filter(variable<90)
}
lapply(kuwait5, nafunction)`
Expected result is a new data frame with all rows that have an 98 removed. What i get is a sequence of data frames each one having ONE column in which rows with NAS are removed.
Related
I am trying to run the corr.test function in a for loop between a range of columns in a data frame against the rest of the columns in the same data frame. However, I have a lot of NA values throughout this data frame. I don't want to omit the rows altogether and lose the rest of the data in the rows and I also don't want to set NA = 0 because it will interfere with the rest of the data (scores that are either -1, 1, or 0). Every time I try to run the corr.test function, R keeps saying that x or y are not numeric vectors.
Is there any way to get around this?
The first column (rownames) of my data frame is a list of sample IDs, columns 2-50 are scores, and 51 onward are scores of a different type. What I've been doing so far is using for loop to run corr.test between each range of columns like this example:
cor.test(data[1:50], data[51:200])
This works fine in the for loop if I convert NA values to 0 but is there any way to avoid doing that?
I am running the Automatic Variance Ratio (AVR) test on my dataset in R. My Dataset Contains 6 Indices i.e. columns exculing the date column. In this test, I need to use FOR LOOP which would constantly roll over the first column i.e. Date column, and keep changing/moving from the 2nd till the 6th column. I am new to R, therefore, I don't know exactly what to do and how to do it. Currently, I have a code that can run this for only the 2nd column but from the 2nd column onwards it can loop over. All of you are requested to please help me in this regard.
A standard way to loop through the columns of a dataframe is with lapply. If your dataframe is df with 7 columns and you want to loop through columns 2 through 7 and your function is Av.VR() then
output_list <- lapply(df[,2:7], function(x) Av.VR(x))
should yield a list of outputs for each column.
Note I have no experience using the function Av.VR().
I am trying to add rows to a dataframe using rbind in R. However, the dataframe is not being updated each time I attempt to add a row. In other words, the following code results in a dataframe with 2 columns but 0 observations when it should have 2 columns with 2 observations.
modeldata2<-data.frame(Model=character(),Accuracy=numeric())
modelname<-"A"
accuracystr2<-2.2
rbind(modeldata2,list(modelname,accuracystr2))
modelname<-"B"
accuracystr2<-3.2
rbind(modeldata2,list(modelname,accuracystr2))
I am using this at the end of a loop to record values and therefore need to first initialize an empty dataframe and then add records to the dataframe at the end of each loop. The code above is just an example that I am using to troubleshoot the problem. I have also tried using c instead of list but the result was the same.
I have two data frames one with 94 rows and 167 columns (df_1) and the other one with 94 rows and 1 column (df_2) and I would like to do 167 different data frames with each column of the first data frame and the same column of the second data frame, I have tried with a for loop like the next
for (i in seq_len(ncol(df_1))){
df_[[i]] <- data.frame(df_1[sort(rownames(df_1)),i,df_2[sort(rownames(df_2)),])
}
But it does not work, can someone help me?
I think to join two df if they have similar column name use the below code
library(gtools)
df3<- smartbind(df1,df2) ####df1 with 167 column and df2 with 1 column
this will give you a single data frame and to create various data frames use the answer used. in the below thread:
How to loop through the columns in an R data frame and create a new data frame using the column name in each iteration?
I need to create a bunch of subset data frames out of a single big df, based on a date column (e.g. - "Aug 2015" in month-Year format). It should be something similar to the subset function, except that the count of subset dfs to be formed should change dynamically depending upon the available values on date column
All the subsets data frames need to have similar structure, such that the date column value will be one and same for each and every subset df.
Suppose, If my big df currently has last 10 months of data, I need 10 subset data frames now, and 11 dfs if i run the same command next month (with 11 months of base data).
I have tried something like below. but after each iteration, the subset subdf_i is getting overwritten. Thus, I am getting only one subset df atlast, which is having the last value of month column in it.
I thought that would be created as 45 subset dfs like subdf_1, subdf_2,... and subdf_45 for all the 45 unique values of month column correspondingly.
uniqmnth <- unique(df$mnth)
for (i in 1:length(uniqmnth)){
subdf_i <- subset(df, mnth == uniqmnth[i])
i==i+1
}
I hope there should be some option in the subset function or any looping might do. I am a beginner in R, not sure how to arrive at this.
I think the perfect solution for this might be use of assign() for the iterating variable i, to get appended in the names of each of the 45 subsets. Thanks for the note from my friend. Here is the solution to avoid the subset data frame being overwritten each run of the loop.
uniqmnth <- unique(df$mnth)
for (i in 1:length(uniqmnth)){
assign(paste("subdf_",i,sep=""), subset(df, mnth == uniqmnth[i])) i==i+1
}