Converting abbreviated "numbers" into numbers in R [duplicate] - r

This question already has answers here:
Converting 1M to 1000000 elegantly
(3 answers)
Closed 6 years ago.
I have a data frame in R that has monetary values such as $25,000 and $2,000,000 entered as 25K and 2M respectively. The data frame is massive, so is there any way I can, for example, change all of the 2M's to 2000000's?

Try gsub() on the letters:
df$variableName <- gsub("M", "000000", df$variableName)
df$variableName <- gsub("K", "000", df$variableName)
and so forth...
Maybe convert the class when you're done class(df$variable) <- "numeric".

Related

Conversion of dataype for multiple columns at once in R [duplicate]

This question already has answers here:
Change the class from factor to numeric of many columns in a data frame
(16 answers)
How to convert a factor to integer\numeric without loss of information?
(12 answers)
Closed 2 years ago.
so I have a csv file with around 60 variables that I have imported in R and I'd like to convert the columns to datatypes like numeric, Posixt, and Boolean. In my previous files I needed to do so only for a limited set of columns so I simply used 4-5 times.
data$var1 <- as.numeric(as.character(data$var1)
data$var2 <- as.numeric(as.character(data$var2)
For now however, when I want to convert 60 columns I'd rather not write 60 lines of code again and again. Could someone help me come up with a more efficient way?

Select only numeric variables of a data frame in R [duplicate]

This question already has answers here:
Why does apply convert logicals in data frames to strings of 5 characters?
(2 answers)
Selecting only numeric columns from a data frame
(12 answers)
Closed 2 years ago.
I know that the question is very easy, but I have a more specific one:
I have a data frame, with 50 variables (numeric and non-numeric) and 5000 observations.
Now what I want to do is create another data frame containing only the numerica variables of the original one.
On this website I found the solution of my problem, that is:
numeric_variables<-unlist(lapply(original_data,is.numeric))
X<-original_data[numeric_variables]
But I was wondering: why if I try like this, it does not work instead? what's wrong?
numeric_variables2<-apply(original_data,2,is.numeric)
x<-original_data[numeric_variables2]
try this :
names_num <- names(which(sapply(df, is.numeric)))
df_num <- df[, names_num]

Subsetting an R Matrix [duplicate]

This question already has answers here:
Extracting specific columns from a data frame
(10 answers)
Closed 4 years ago.
in R programming, how do I subset a matrix so that I can skip columns in between? I only know how to do it continuously such as 1:4, but what if I want the first, second, and fourth colum
You can select specific columns as follows:
new_df <- x[,c(1,2,4)]# Select column 1,2 and 4

Find total frequency of number In Column [duplicate]

This question already has answers here:
Counting the number of elements with the values of x in a vector
(20 answers)
Closed 5 years ago.
I am new on R. I want to ask, How to find frequency of each Number in Column, there are multiple numbers in column. i want to frequency of each number. I want just simple code. You can imagine that data set name is Oct-TT. Thanks
Here is the answer:
df <- as.data.frame(sample(10:20, 20,replace=T))
colnames(df) <- "Numbers"
View(df)
as.data.frame(table(df$Numbers))

Subsetting data frame based on values a particular column takes [duplicate]

This question already has answers here:
Subset multiple rows with condition
(3 answers)
Closed 8 years ago.
Here is a trivialized example whose solution would help me greatly.
v.1<- c(5,8,7,2)
v.2<- c("hi", "hello", "hum", "bo")
df<- data.frame(v.1, v.2)
desired.values<- c("hi", "bo")
I would like all rows of the dataset where v.2 takes on one of the desired.values.
Desired output:
5 "hi"
2 "bo"
In my real dataset, v.2 has more than 10000 values and desired.values contains more than 2000 values.
You could try data.table
library(data.table)
setkey(setDT(df),v.2)[desired.values]
Or using base R methods
df[df$v.2 %in% desired.values,]
Or
df[grep(paste(desired.values, collapse="|"), df$v.2),]

Resources