I have a 10 (question items) by 500 (respondents) vector in R.
Upper 250 are male while lower 250 are female.
Can you tell me how to create a gender variable, and assign 0 and 1 to this variable based on row numbers in R?
Thank you very much! Stay safe.
This solution assumes your dataset is in a data frame, not a vector, that the dataset is named "dat" (change it to whatever you are calling your data), and that the variable "gender" does not already exist in "dat".
dat$gender <- NA # Creates a new, empty column in the dataset (NA stands for missing data, or not available)
dat[1:250, "gender"] <- "0" # assigns the category 0 to rows 1-250
dat[251:500, "gender"] <- "1" # assigns the category 1 to rows 251-500
Hope this helps! As the comments suggest, providing a sample of your data will help us help you.
Related
I have a .csv file of 39 variables and 713 rows, each containing a count of plastic items. I have another column which is the survey length, and I want to standardise each count of items by a survey length of 100. I am unsure how to create a loop to run through each row and cell individually to do this. Many also have NA values.
Any ideas would be great.
Thank you.
Consider applying formula directly on columns without need of looping:
# RETRIEVE ALL COLUMN NAMES (MINUS SURVEY LENGTH)
vars <- names(df)[!grepl("survey_length", names(df))]
# EXPAND SINGLE COLUMN TO EQUAL DIMENSION OF DATA FRAME
survey_length_mat <- matrix(df$survey_length, ncol=length(vars), nrow=nrow(df))
# APPLY FORMULA
df[vars] <- (df[vars] / survey_length_mat) * 100
df
My initial dataset was a csv containing information about the number of bikes that were rented in a certain city with other variables being temperature,season, etc...
I was creating a subset based on conditionals to get a set that would have seasons be "3" or "4" and annee be "1". I tried the following:
P<- subset(velo,saison>2&annee==1)
I also tried
W<- velo[which(velo$annee==1 & velo$saison>2),]
Which both returned the same dataframe/subset of 183 obs 5 variables
I then wanted to summarise the data through
summary(W$velos[saison==3])
summary(W$velos[saison==4])
It gives me the following outputs
In the data set I can see that the column season is not full of NaN and doing the class() returns integer for that column.
The issue was because of not extracting the column
summary(W$velos[W$saison==3])
My data consists of data about smartphones.
To do a random forest, I need to convert my factor Brand into a lot of dummies.
I tried this code
m <- model.matrix( ~ Brand, data = data_price)
Intercept BrandApple BrandAcer BrandAlcatel ...
1 0 0 1
1 1 0 0
...
The problem is that the original data has 2039 rows, while the output of this only has 2038.
Now I want to add the dummies to my data_price, but this doesn't works.
How could I make a dummy and add it to my data set?
Your approach using model.matrix should work fine, and we only need to figure out what happened to that missing row. I guess the issue is that there are missing values in your factor. Consider the following:
dat <- factor(mtcars$cyl)
dat2 <- dat
dat2[1] <- NA
Here, I have taken a factor, namely the number of cylinders in the mtcars dataset, and for comparison I have created a second factor where I have replaced one value with NA. Let's look at the number of rows that model.matrix will spit out in each case:
nrow(model.matrix(~dat))
[1] 32
nrow(model.matrix(~dat2))
[1] 31
You see that in the case where the factor variable had a missing value, the output of model.matrix had one row less, which is maybe not surprising.
You can either create an own factor level for the missing value, or you can safely drop the row with the missing value from your original data set, if this seems appropriate given your application. The output of model.matrix contains row names, which you can use to merge the data back onto the original dataframe if you want to go down that route.
I have a one-column xts object:
a <- c(1,1,1,2,3,2,2,2,2,1,0,0,0,0,2,3,4,4,1,1)
date <- Sys.Date()-20:1
data <- xts(a,date)
colnames(data) <- "a"
data
Here I want all the numbers in the column a to be replaced by +1 and then -1 respectively, except 0. I want the a column to look like:
1,-1,1,-1,1,-1,1,-1,1,-1,0,0,0,0,1,-1,1,-1,1,-1
I've asked similar questions, but this is not a exact duplicate.
Assuming that your data frame is named df. This will repeat values 1 and -1 for all a that are not 0.
df[a!=0,]<-c(1,-1)
I am new to R with a fairly simple question, I just can't figure out the answer. For my example I will use a data frame with 3 columns, but my actual data set is 139 columns with 10000 rows.
I want to replace all of the values in a given row with NA if the value in the same row in column C contains a value < 10.
Assume that all of my columns are either number or integer values.
so I want to take the data frame:
x=data.frame(c(5,9,2),c(3,4,6),c(12,9,11))
names(x)=c("A","B","C")
and replace row 2 with NA to create
y=data.frame(c(5,"NA",2),c(3,"NA",6),c(12,"NA",11))
names(y)=c("A","B","C")
Thanks!
how about:
x[x$C <10 ,] <- NA