Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I am working on a project converting a bunch of stata code to R to perform data cleaning, and one of the things I'm trying to do is to write a single R function that cleans all of my Yes/No variables that were previously coded as (Yes = 1, No = 2) to standard dummy variables.
The thing is that the number of variables that need to be cleaned by this function will constantly be changing. So my guess is that the function will need to take as its arguments (1) the dataset/dataframe with all the variables, and (2) the list of variables that need to be cleaned.
Any help on this would be greatly appreciated, as I'm pretty new to R.
Thanks!
You could try this:
example <- data.frame(sex=runif(10),q1=rep.int(c(1,2),5),q2=rep.int(c(2,1),5))
yesno <- function(data, variables) {
data.new <- data
data.new[,names(data) %in% variables] <- -data[,names(data) %in% variables]+2
return(data.new)
}
example
yesno(example, c("q1","q2"))
sapply(data, function(x) {-x+2})
data contains your columns of 1, 2. The anonymous functions turns all Yes/1 into 1, and No/2 into 0.
Related
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 years ago.
Improve this question
Okay so I'm making logs of some distance variables
- example;
loghospital=log(hospital_2015_distance, base=exp(1))
Works, i get values that i can run in a regression.
However for my LASSO regression it's better i specify a dataset.
So i want a dataframe of these logs (values).
Or better I want these logs (values) added to my existing dataframe called (data).
Any idea how this can be achieved? And if not, what else i should do to achieve the same?
To add it to your data.frame you can use $:
data$loghospital = log(hospital_2015_distance, base=exp(1))
Also you could use [[ or [ and probably should <- instead of = for assignment:
# Examples:
data[["loghospital"]] <- log(hospital_2015_distance, base=exp(1))
data["loghospital"] <- log(hospital_2015_distance, base=exp(1))
data[, "loghospital"] <- log(hospital_2015_distance, base=exp(1))
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I need to write a function in R, since in other languages like c++ it works very slow. The function fills a 2d table with data, and then summarizes values of each row for further processing.
I am not sure if it answers your question, but if you work with data you can put them into data frames to take a look at the statistical parameters and for further processing. For example:
df = data.frame("var1" = c(5,10,15), "var2" = c(20,40,60))
#the 'summary' command gives you some statistical parameters based on the column
summary(df)
#with the 'apply' command you can addresses the rows.
#in this example you get the mean of each row:
apply(df, 1,mean)
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
I have written the following code to compare Two Market, the code is working if we provide the Data Frame name individually.
enter image description here
for(i in 1:nrow(Market_SystemA))
{
A <- Market_SystemA[i,2]
B <- Market_SystemB[i,3]
MarketA <- data.frame(A)
MarketB <- data.frame(B)
#This is s fuction in R
Compare_Function(MarketA,MarketB)
}
I'm not sure if I understand your question correctly, but it seems like you are calling a compare_function on two strings that refer to existing data frames. To actually get the data frames from the string, then you will need to use the get function which looks for an object that has a name that matches the string.
MarketA <- get(A)
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I would like to understand how really works this script :
y <- y[keep, , keep.lib.sizes=FALSE]
in :
keep <- rowSums(cpm(y)>1) >= 3
y <- y[keep, , keep.lib.sizes=FALSE]
I do know d.f[a,b] but I can not find R-doc for d.f[a, ,b].
I tried "brackets", "hooks", "commas"... :-(
(Sometimes I would prefer that one does not simplifie his R script !)
Thanks in advance.
Subscripting data.Frames takes two values: df[rows, columns]. Any third value are optional arguments that you can use to subscript.
The most common of those is drop=FALSE as in df[1:18, 3, drop = FALSE]. This is done because when you subset just one column of a data.frame, it will lose the data.frame class. In your specific case, it seems like you are using another object that looks like a data.frame but with added functionalities from the bioconductor package. A look at the methods for those will tell you how these work.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I am new to r and have a data set containing a column with 3 states (1,2,3). The problem is i dont know to split the data set with respective dummy variables as to create box plots and ultimately a linear model.
PLease help!! :'(
So I think you can specify which feature is categorical.
Say
data<- read.csv(filename)
data$feature <- factor(data$feature)
Where feature is the feature you want to convert to categorical data?
Is that what you are looking for?
If I get your problem, you have 2 columns, one with factor levels (1, 2, 3) in your example, and another response variable. Is there it? (An example with part of your data would be very helpful). In any case, if your data has this structure you don't need to split it. For a boxplot just run
boxplot(data$variable~data$factor)
You can use the same approach for a linear model:
lm(data$variable~data$factor)
If your data has other structure, you will need to explain it before someone can give further help...