Replace an entry in data frame by the number of the column - r

I want to find entries in an R dataframe based on their value in order to be able to replace them by the number of the column each of these entries is located in. Well, it's easy to modify particular entries based on their location or based on their value. Let's say this would replace all zeros in the data frame with 1:
df[df==0]<-1
But how do you replace all zeros in your df by the number of the column they're in?

df[df==0] <- which(df==0, arr.ind = TRUE)[,2]

df[]<-lapply(1:ncol(df),function(i){
ifelse(df[,i]!=0,df[,i],i)
})

Related

Getting only the rownames containing a specific character - R

I have a Seurat R object. I would like to only select the data corresponding to a specific sample. Therefore, I want to get only the row names that contain a specific character. Example of my differences in row names: CTAAGCTT-1 and CGTAAAT-2. I want to differentiate based on 1 and 2. The code below shows what I already tried. But it just returns the total numbers of row. Not how many rows are matching the character.
length <- length(rownames(seuratObject#meta.data) %in% "1")
OR
length <- length(grepl("-1",rownames(seuratObj#meta.data)))
Idents(seuratObject, cells = 1:length)
Thanks for any input.
Just missing which()
length(which(grepl("-1", rownames(seuratObject#meta.data))))

Randomly selecting dataframe column. Avoid sampling same column again

Is there a way to random pick a column in a dataframe and then avoid randomly pick it again? This should pick a random column
random_data_vector = data[, sample(ncol(data), 1)]
but I'm not sure how to avoid picking the column again. I thought about removing the column completely but there might be a better approach
You can first sample the columns with
random_cols <- sample(ncol(data))
and then select the random vectors like this
random_data_vector1 <- my_df[, random_cols[1]]
random_data_vector2 <- my_df[, random_cols[2]]
The default setting of sample is replace = FALSE, thus in the random_cols vector you won't have duplicated numbers and you won't select one column twice.

Imputing missing character values in R

I have a dataset called credit_df with dimensions 32561*15. It has a column for native.country with 1843 missing values. missing values are given as ?
I have created a factor variable with the list of countries using the below code
country <- unique(credit_df$native.country)
The above code also came with one ? value as it was part of the dataset. So i have removed that alone using the below
country <- as.data.frame(country)
country %>% filter(country != "?")
Now the country factor variable has all the country names in the dataset. Now I would like to assign those to the missing values in the column randomly. How do i do it ?
I tried the below code per one of the suggested methods
credit_df$native.country[credit_df$native.country %in% c("?")] <-
sample(country, NROW(credit_df$native.country[credit_df$native.country %in% c("?")]), replace = T)
but all the "?" turned out to be missing values
sum(is.na(credit_df$native.country))
[1] 583
NOTE: Even not considering this example if any of you could suggest how to impute character values randomly I am okay with it.
Example : if I have a column of country with missing values . and I have a vector/dataframe with a bunch of country names. How do i assign them randomly to the missing values in the country column
You could try using sample()
credit_df$native.country[credit_df$native.country %in% c("?")] <-
sample(country, NROW(credit_df$native.country[credit_df$native.country %in% c("?")]), replace = T)
The sample command here creates a vector using random values form country. The length of the generated vector is the same length as the number of rows you want to replace. The replace = T argument is only needed if you want to take a sample larger than the population (didn't know how much rows there are to replace and how many values there are in country).

Assigning name to rows in R

I would like to assign names to rows in R but so far I have only found ways to assign names to columns. My data is in two columns where the first column (geo) is assigned with the name of the specific location I'm investigating and the second column (skada) is the observed value at that specific location. To clarify, I want to be able to assign names for every location instead of just having them all in one .txt file so that the data is easier to work with. Anyone with more experience than me that knows how to handle this in R?
First you need to import the data to your global environment. Try the function read.table()
To name rows, try
(assuming your data.frame is named df):
rownames(df) <- df[, "geo"]
df <- df[, -1]
Well, your question is not that clear...
I assume you are trying to create a data.frame with named rows. If you look at the data.frame help you can see the parameter row.names description
NULL or a single integer or character string specifying a column to be used as row names, or a character or integer vector giving the row names for the data frame.
which means you can manually specify the row names when you create the data.frame or the column containing the names. The former can be achived as follows
d = data.frame(x=rnorm(10), # 10 random data normally distributed
y=rnorm(10), # 10 random data normally distributed
row.names=letters[1:10] # take the first 10 letters and use them as row header
)
while the latter is
d = data.frame(x=rnorm(10), # 10 random data normally distributed
y=rnorm(10), # 10 random data normally distributed
r=letters[1:10], # take the first 10 letters
row.names=3 # the column with the row headers is the 3rd
)
If you are reading the data from a file I will assume you are using the command read.table. Many of its parameters are the same of data.frame, in particular you will find that the row.headers parameter works the same way:
a vector of row names. This can be a vector giving the actual row names, or a single number giving the column of the table which contains the row names, or character string giving the name of the table column containing the row names.
Finally, if you have already read the data.frame and you want to change the row names, Pierre's answer is your solution

How to replace all of specific entry in a column with new entry

I have a data frame with a column that is filled with string entries of {A,B,C}, but want to replace all entries of A with B.
What function would be best to do this? Thanks, I'm still an R newbie!
gsub allows you to pattern match replace values in a vector as in the following small example.
df = data.frame(sample(100,12), letters[1:3])
df[,2] = gsub("a","b",df[,2])

Resources