Combine lapply and gsub to replace a list of values for another list of values - r

I am currently looking for a way to simplify searching through a column within a dataframe for a vector of values and replacing each of of those values with another value (also contained within a separate vector). I can run a for loop for this, but it must be possible within the apply family, I'm just not seeing it yet. Very new to using the apply family and could use help.
So far, I've been able to have it replace all instances of the first value in my vector with the new first value in the new vector, it just isn't iterating past the first level. I hope this makes sense. Here is the code I have:
#standardize tank location
old_tank_list <- c("7.C.4","7.C.5","7.C.6","7.C.7","7.C.8","7.C.9","7.C.10","7.C.11")
new_tank_list <- c("7.B.3-4","7.C.3-4","7.C.1-2","7.C.5-6","7.C.7-8","7.C.9-10","7.E.9-10","7.C.11-12")
sapply(df_growth$Tank,function(y) gsub(old_tank_list,std_tank_list,y))
Tank is the name of the column I am trying to replace all of these values within. I haven't assigned it back yet, because I want to test the functionality first. Thanks for any help you can offer.
Hopefully, this image will help. The photo on the left is the column before my function is applied. The column on the right is after. Basically, I just want to batch change text values.
Before and After

library(dplyr)
df %>%
mutate(Tank = recode(Tank, !!!setNames(new_tank_list, old_tank_list)))

Related

Replacing df values in a column with values from anothe df via key

I need to replace values in the Nth column of my df, call these values v1s, by some other values from anothe df, call them v2s. There is a dictionary, or ruther two dictionaries. The first one translates v1s into numbers, the second one translates the numbers into v2s. I tried merge(), left/right_join(), smth else...but nothing seems to work. Can somebody help please?
Merging the datasets should work. Try the code until you can make it work.
Otherwise, you can always simply add an extra column to your dataset with
datasetA$newvar <- datasetB$v2s
when you have correctly added the second variable, simply drop the first.

Naming data frames in lists using a sequence

I have a rather simple question. So I have a list, and I want to name the data frames in the list according to a sequence. Right now I have a sequence that increase according to one letter per list (explained below):
nm1 <- paste0("Results_Comparison_",LETTERS[seq_along(Model_comparisons)])
This creates "Results_Comparison_A", "Results_Comparison_B", "Results_Comparison_C", "Results_Comparison_D", etc. What I want is it for it to be a number instead of a letter. (i.e. Results_Comparison_1, Results_Comparison_2, Results_Comparison_3, etc.) Does anyone know how I could change this? If extra information is needed let me know!
This should work paste0("Results_Comparison_",seq_along(Model_comparisons))

How to unnest data and obtain the first element from an array in SparkR?

I am new to SparkR and trying first steps of data preparation.The dataset is something of this kind. I was trying to subset and select significant columns. My question is how can I select a column from an array element. I was trying something like this, which allowed me to select columns by un-nesting data but couldn't unnest and flatten the array to get it's first element. Helpful Link
select.col <- SparkR::select(data,c("parsed.nid","parsed.status","parsed.sections.element[0].name"))
I myself found a way to resolve this issue.This can be done in two simple steps :-
First we need to use explode() in SparkR, to get all the contents in
the list from that column.
Next, we need to use windowPartitionBy() in SparkR to create a
partitions and then we can get anything we want based on our
requirements like row_number(),dense_rank(),rank() etc. Like here we want the first element of the list, so I have used row_number function.
Snippet :
data.select <- SparkR::select(data,c("parsed.nid","parsed.status","parsed.sections"))
names(data.select) <- c("nid","status","sections")
categories <- SparkR::select(data.select,data.select$nid,data.select$status,explode(data.select$sections))
ws <- SparkR::orderBy(SparkR::windowPartitionBy("nid","status","sections"),"nid")
data.final <- SparkR::mutate(categories,row_num = over(row_number(), ws))
##If we want to get the first element of the array.
data.final <- data.final[data.final$row_num==1,]
Please add your suggestions as well.

Only last iteration of loop is saved

I have a list of dataframes (subspec2) which I want to loop through to get the columns with the maximum value from each dataframe, and write these to a new dataframe. I wrote the following loop:
good.data<-data.frame(matrix(nrow=401, ncol=78)) #create empty dataframe
for (i in length(subspec2)) ##subspec2 is the list of dataframes
{
max.name<-names(which.max(apply(subspec2[[i]],MARGIN=2,max))) #find column name with max value
good.data[,i]<-subspec2[[i]][max.name] #write the contents of this column into dataframe
}
This seems to work but only returns values in the last column, nothing else appears to have been saved. Many threads point out the df must be outside the loop, but that is not the problem here.
What am I doing wrong?
Thank you!
I believe you need to change for (i in length(subspec2)) to for (i in 1:length(subspec2)). The former will only do 1 iteration, where i = length(subspec2) whereas the latter iterates over multiple is.
(I am pretty sure that is your issue, but one thing that is great to do is to create a reproducible example so I can run your code to double check, for example I am not exactly sure what subspec2 looks like, and I am not able to run your code as it is, a great resource for this is the reprex package).

Changing hundreds of column names simultaneously in R

I have a data frame with hundreds of columns whose names I want to change. I'm very new to R, so it's rather easy to think through the logic of this, but I simply can't find a relevant example online.
The closest I could sort of get was this:
projectFileAllCombinedNames <- for (i in 1:200){names(projectFileAllCombined)[i+1] <-variableNames[i]}
Basically, starting at the second column of projectFileAllCombined, I want to loop through the columns in the dataframe and assign them the data values in the second data frame. I was able to change one column name manually with this code:
colnames(projectFileAllCombined)[2]<-"newColumnName"
but I can't possibly do that for hundreds of columns. I've spent multiple hours on this and can't crack it with any number of Google searches on "change multiple columns in r" or "change column names in r". The best I can find online is examples where people change a few columns with a c() function and I get how that works, but that still seems to require typing out all the column names as parameters to the function, unless there is a way to just pass the "variableNames" file into that c() function, but I don't know of one.
Will
colnames(projectFileAllCombined)[-1] <- variableNames
not suffice?
This assumes the ordering of columns in projectFileAllCombined is the same as the ordering of the new variable names in variableNames, and that
length(variableNames) == (ncol(projectFileAllCombined) - 1)
The key point here is that the replacement function 'colnames<-'() is vectorised and can replace any number of column names in a single call if passed a vector of replacement values.

Resources