Replacing df values in a column with values from anothe df via key - r

I need to replace values in the Nth column of my df, call these values v1s, by some other values from anothe df, call them v2s. There is a dictionary, or ruther two dictionaries. The first one translates v1s into numbers, the second one translates the numbers into v2s. I tried merge(), left/right_join(), smth else...but nothing seems to work. Can somebody help please?

Merging the datasets should work. Try the code until you can make it work.
Otherwise, you can always simply add an extra column to your dataset with
datasetA$newvar <- datasetB$v2s
when you have correctly added the second variable, simply drop the first.

Related

Combine lapply and gsub to replace a list of values for another list of values

I am currently looking for a way to simplify searching through a column within a dataframe for a vector of values and replacing each of of those values with another value (also contained within a separate vector). I can run a for loop for this, but it must be possible within the apply family, I'm just not seeing it yet. Very new to using the apply family and could use help.
So far, I've been able to have it replace all instances of the first value in my vector with the new first value in the new vector, it just isn't iterating past the first level. I hope this makes sense. Here is the code I have:
#standardize tank location
old_tank_list <- c("7.C.4","7.C.5","7.C.6","7.C.7","7.C.8","7.C.9","7.C.10","7.C.11")
new_tank_list <- c("7.B.3-4","7.C.3-4","7.C.1-2","7.C.5-6","7.C.7-8","7.C.9-10","7.E.9-10","7.C.11-12")
sapply(df_growth$Tank,function(y) gsub(old_tank_list,std_tank_list,y))
Tank is the name of the column I am trying to replace all of these values within. I haven't assigned it back yet, because I want to test the functionality first. Thanks for any help you can offer.
Hopefully, this image will help. The photo on the left is the column before my function is applied. The column on the right is after. Basically, I just want to batch change text values.
Before and After
library(dplyr)
df %>%
mutate(Tank = recode(Tank, !!!setNames(new_tank_list, old_tank_list)))

Role of square brackets

I got this code from elsewhere and I wondering if someone can explain what the square brackets are doing.
matrix1[i,] <- df[[1]][]
I am using this to assign values to a matrix and it works but I am not sure what exactly it's doing. What does the initial set of [[]] mean followed by another []?
This might help you understand a bit. You can copy and paste this code and see the differences between different ways of indexing using [] and $. The only thing I can't answer for you is the second empty set of square brackets, from my understanding that does nothing, unless a value is within those brackets.
#Retreives the first column as a data frame
mtcars[1]
#Retrieves the first column values only (three different methods of doing the same thing)
mtcars[,1]
mtcars[[1]]
mtcars$mpg
#Retrieves the first row as a data frame
mtcars[1,]
#I can use a second set of brackets to get the 4th value within the first column
mtcars[[1]][4]
mtcars$mpg[4]
The general function of [ is that of subsetting, which is well documented both in help (as suggested in comments), and in this piece. The rest of of my answer is heavily based on that source.
In fact, there are operators for subsetting in R; [[,[, and $.
The [ and $ are useful for returning the index and named position, respectfully, for example the first three elements of vector a = 1:10 may be subsetted with a[c(1,2,3)]. You can also negatively subset to remove elements, as a[-1] will remove the first index.
The $ operator is different in that it only takes element names as input, e.g. if your df was a dataframe with a column values, df$values would subset that column. You can achieve the same [, but only with a quoted name such as df["values"].
To answer more specifically, what does df[[1]][] do?
First, the [[-operator will return the 1st element from df, and the following empty [-operator will pull everything from that output.

Finding a character variable in a column

I have a huge data frame df, with many columns. One of the columns named id_nm happens to be a character with values such as: aksh123dn.Ins
class(df$id_nm)
returns character
I need to lookup all those values which have the id_nm say aksh123dn.Ins
I used:
new_df<-df[df$id_nm=='aksh123dn.Ins',]
this returns the entire df which isn't the case in reality
also tried:
new_df<-df%filter(id_nm=='aksh123dn.Ins']
still getting the same answer
I think its possibly because it is a character string. Please help me with this. TIA

DataFrame nrows null after adding a column

I am using DataFrames in a C++ code that I use in a R package. At some point, I want to add a column of ones at the front of my DataFrame. I do so as follow, where df is my DataFrame:
df.push_front(NumericVector(df.nrows(), 1.0), "(Intercept)");
After this, I expect df to contain its column plus a new one full one ones at the beginning. And it seems to be doing that.
But the problem is that now df.nrows() is equal to zero, and not what it used to be. Am I missing something?

Why does R think my imported vector of characters are numbers?

This is probably a basic question, but why does R think my vector, which has a bunch of words in it, are numbers when I try to use these vectors as column names?
I imported a data set and it turns out the first row of data are the column headers that I want. The column headers that came with the data set are wrong ones. So I want to replace the column names. I figured this should be easy.
So what I did was I extracted the first row of data into a new object:
names <- data[1,]
Then I deleted the first row of data:
data <- data[-1,]
Then I tried to rename the column headers with the "names" object:
colnames(data) <- names
However, when I do this, instead of changing my column names to the words within the names object, it turns it into a bunch of numbers. I have no idea where these numbers come from.
Thanks
You need to actually show us the data, and the read.csv()/read.table() command you used to import.
If R thinks your numeric column is string, it sounds like that's because it wrongly includes the column name, i.e. you omitted header=TRUE in your read.csv()/read.table() import.
But show us your actual data and commands used.

Resources