I have a data frame (originally from a CSV file) with the columns NAME and YEAR. I have extracted a sample from this data frame of the first ten entries like so:
sample<-df(1:10,)
I want to know the frequency of the values in the NAME column so I input the following:
as.data.frame(table(sample$NAME))
This counts the frequency in the sample correctly but also includes every name from the original data frame in the 'Var1' column (all with a Freq of 0).
The same thing happens if I use unique(sample$NAME) as well: it lists the names from the sample along with all of the names from the original data frame as well.
What am I doing wrong?
This could be a case of unused level in the 'NAME' factor column. We can use droplevels or call factor again to remove those unused levels.
as.data.frame(table(droplevels(sample$NAME)))
Or
as.data.frame(table(factor(sample$NAME)))
Related
I am trying to add a column from one dataframe to another. The data is long repeated measures data, with each ID having two rows. Both my main dataset (d) and my secondary dataset (d2) use the same column (ID) to link cases to participants. However, when I use mutate like this d <- mutate(d, x = d2$x) the column binds to the dataframe but the values are not tied to the ID. This means that data gets mixed up between participants.
Is there a way to make sure that the values are referenced by ID when I add the column?
I am working on data set for analysis.
I have to remove column name of single variable of data frame in r.
I have used colnames(df)<-NULL function,it removes all colnames of dataframe
The excepted output is:
I don't really get the expected output, but maybe this is what you're looking for
colnames(df)[i] <- NA
'i' is the column location (for example 1,2,3 etc)
I have two data frames, one containing the predictors and one containing the different categories I want to predict. Both of the data frames contain a column named geoid. Some of the rows of my predictors contains NA values, and I need to remove these.
After extracting the geoid value of the rows containing NA values, and removing them from the predictors data frame I need to remove the corresponding rows from the categories data frame as well.
It seems like a rather basic operation but the code won't work.
categories <- as.data.frame(read.csv("files/cat_df.csv"))
predictors <- as.data.frame(read.csv("files/radius_100.csv"))
NA_rows <- predictors[!complete.cases(predictors),]
geoids <- NA_rows['geoid']
clean_categories <- categories[!(categories$geoid %in% geoids),]
None of the rows in categories/clean_categories are removed.
A typical geoid value is US06140231. typeof(categories$geoid) returns integer.
I can't say this is it, but a very basic typo won't be doing what you want, try this correction
clean_categories <- categories[!(categories$geoid %in% geoids),]
Almost certainly this is what you meant to happen in that line. You want to negate the result of the %in% operator. You don't include a reproducible example so I can't say whether the whole thing will do as you want.
I have 10 topics. For each topic name I have a results_topic_df data frame. In this data frame there are 2 columns: index, which is a name of another data frame and var_name, which is a name of a variable inside the corresponding data frame (indicated by index).
What I want to do is to take the corresponding original data frame (whos name is indicated by results_topic_df$index), look at the value of results_topic_df$var_name in the same row, go to the original data frame and copy the relevant variable to a data frame named container_df.
Eventually I will have container_df having only the selected variables from all the data frames that appear in results_topic_df.
I want to repeat this procedure for each one of the 10 topics.
I have tried to do this with a loop but because my data frames' names change, I got really confused with all the combinations of assign(),paste0(), and eval(). Is there a simpler way to accomplish my goal? Thanks.
Please pardon my ignorance ,Since i am beginner in R
In R i am transposing a dataframe (certain rows to columns) and saving the result back to a data frame which is exactly what i need .
But the column name for the 1st column is missing which i need to join it with other data frames .
Data frame result and function used
dish_pair<-as.data.frame.matrix(xtabs(count~primary_id+subcategory_name, dishes))
output
But How can i get the 1st column name as primary_id
which are holding row values 50792 ,50793
(I just need the 1st column name value as primary_id ,renaming data frame values are correct)
The first column on the picture is just the row names. You need to add it to the data frame and give the column a name:
dish_pair <- data.frame(primary_id=rownames(dish_pair), dish_pair)