It's the data file which contains the information of all households. I want to make a new column father_edu for those females whose ages are between 15-30, keeping in the view that sbq02 (relationship with the head).
I wanna create a new column on basis of sbq02, sbq04 and age. for example if sbq02=daughter, sbq04=female and age between 15 to 30, then a new column named as father_edu must have the value of scq04 for those who have sbq02=head and sbq04=male
Related
I have five tables, each with between 20 to 30 variables and 1,500 to 2,300 observations. I want to make a new dataset similar in values to the original dataset but not the same. Meaning if one variable is gender. I want the new dataset to have gender data, but the values would differ.
I'm unsure what functions to use or even how to search for some methods. My Google searches are coming up with how to make subsets of data, but I need new datasets with the same number of variables and observations with random values based on the values of the source dataset.
Any advice would be helpful.
My goal is to administer both variables and their values in a spreadsheet.
Basically I want to be able to add the new values for a new year in a new column and load them into R.
I then want to assign the variables named in the first column with the corresponding value in either one of the second or third column.
Input spreadsheet:
Variable
Year2013
Year2018
age
12
17
pets
c(cat,dog,elephant)
c(dog,mouse)
cars
cars$name
cars$name
Desired Output:
For year 2013
import("dataspreadsheet.csv")
derived from this -->
age <- 12
pets <- c(cat,dog,elephant)
cars <- cars$name
Is there any way to tell R to make this assignment?
I have a dataframe, that consists of two columns: name and date.
Now I would like to add a new column - date 2 - and the values within these new columns should be the next business day taken from the date in the same row.
Is there a way to add the new column and directly say that it needs to take the value from column 2 and search the next business day for column 3?
I am practicing my R programming skills using Kaggle data sets, and I could use some help. I am working on the Ghosts, Ghouls, and Goblins data set and the goal is to predict which type of monster each row represents based on a set of descriptive stats. I trained a multinomial logistic regression model using a training data set to get probability values for each of the 3 types, and now I just want to put the name of the monster in the last cell of each row in the test data set based on on the max probability from 3 columns in that row. Here is the head of my table: predProbs Table
What I have currently tried seems to populate every cell in the type column with the same value. How can I calculate the max probability within the columns "Ghost", "Ghoul", and "Goblin", get the column name of the column containing the max value, and then populate the last cell in every row (column name: type) with the name? I want to do this for every row in the test data set. This is what I am currently trying to do and then just cbind typesList with the whole list called predProbs.
for (i in nrow(predProbs)) {typesList = append(typesList, which.max(apply(predProbs[i,7:9], MARGIN = 2, max)))}
But this doesn't seem to be creating the vector that I need. Any thoughts?
This is similar to this post: find max value in a row and update new column with the max column name
But, unfortunately, I'm not very fluent in SQL yet so I'm not able to translate it to R.
Any help would be greatly appreciated. Thanks!
-Wes
You should think of something like this:
t(apply(predProbs,1,function(i)append(i,names(predProbs)[which.max(i)],length(i))))
I'll describe my data:
First column are corine_values, going from 1 to 50.
Second column are bird_names, there are 70 different bird_names, each corine_value has several bird_names.
Third column contains the sex of the bird_name.
Fourth column contains a V1-value (measurement) that belongs to the category described by the first three columns.
I want to create a table where the the row names are the bird_names. First all the females in alphabetical order, followed by the males in alphabetical order. The column names should be the corine_values, from small to big. The data in the table should be the corresponding V1-values.
I've been trying some things, but to be honest I'm just starting with R and I don't really have a clue how to do it. I can sort the data, but not on multiple levels (like alphabetical and sex combined). I'm exporting everything to Excel now and doing it manually, which is very time-consuming.