how can i get numeric data from all this character data? - r

In the data set I use, there is no numeric information other than the measurement values explained with 0 and 1 values. the remaining columns are values such as location, education information. how can i get numeric data from all this character data? By the way, I'm using the R language.
I got some frequency values but I don't know what to do about columns like location, education.

Related

replacing missing values in R with the one value that follows (not the mean)

I'm trying to replace the missing values in R with the value that follows, I have annual data for income by country, and for the missing income value for 2001 for country A I want it to pull the next value (this is for time series analysis with multiple different countries and different columns for different variables - income is just one of them)
I wrote this code for replacing the missing values with the mean, but statistically I think it makes more sense to replace the missing values with the value right below it (that comes next, the next year) since the numbers will be very different depending on the country so if I take an average it'll be of all years for all countries).
Social_data_R<-within(Social_data_R,incomeNAavg[is.na(income)]<-mean(income,na.rm=TRUE))
I tried replacing the mean part of the code above with income[i+1] but it didn't recognize 'i' (I uploaded the data from excel, so didn't create the dataframe manually)

How do I convert a column of numerical data into strings in R?

This is my first Stackoverflow post. I'm terrible at R so please bear with me.
I have a dataframe called votedata and a column called district. They are labeled 1,1,1,2,2,2,3,3,4,4,4,5,5,5.
I want to regress lm(votepercent ~ votebuypercent + district)
So basically, I need to regress votepercent on votebuypercent, with districts as dummy variables. I tried converting districts into string data, but then I have distinct strings which cause multicolliniarity. What can I do?

How to subtract 1 from a column in a dataframe?

I have a dataframe, with one column as factor. I want to subtract 1 from all rows in my column, but when i try i get an error message that " - " is not meaningful for factors.
How can i do this?
Factors aren't numbers even though there is a numbering system under them. Thus, when you try to substract 1 from factor levels, software will error. This is a logic error, not a software error.
Did you want factors, or was your data converted to factors when you imported it? If you want numeric data, you can convert factors to numbers by using one command.

Removing data frames from a list that contains a certain value under a variable in R

Currently have a list of 27 correlation matrices with 7 variables, doing social science research.
Some correlations are "NA" due to missing data.
When I do the analysis, however, I do not analyse all variables in one go.
In a particular instance, I would like to keep one of the variables conditionally, if it contains at least some value (i.e. other than "NA", since there are 7 variables, I am keeping anything that DOES NOT contain 6"NA"s, and correlation with itself, 1 -> this is the tricky part because 1 is a value, but it's meaningless to me in a correlation matrix).
Appreciate if anyone could enlighten me regarding the code.
I am rather new to R, and the only thought I have is to use an if statement to set the condition. But I have been trying for hours but to no avail, as this is my first real coding experience.
Thanks a lot.
since you didn't provide sample data, I am first going to convert your matrix into a dataframe and then I am just going to pretend that you want us to see if your dataframe df has a variable var with at least one non-NA or 1. value
df <- as.data.frame(as.table(matrix)) should convert your matrix into a dataframe
table(df$var) will show you the distribution of values in your dataframe's variable. from here you can make your judgement call on whether to keep the variable or not.

Cluster analysis on two columns that contain name of person in R

I am a beginner in R. I have to do cluster analysis in data that contains two columns with name of persons. I converted it in data frame but it is character type. To use dist() function the data frame must be numeric. example of my data:
Interviewed.Type interviewed.Relation.Type
1. An1 Xuan
2. An2 The
3. An3 Ngoc
4. Bui Thi
5. ANT feed
7. Bach Thi
8. Gian1 Thi
9. Lan5 Thi
.
.
.
1100. Xung Van
I will be grateful for your help.
You can convert a character vector to a factor using factor. A factor is basically a vector of numbers together with an attribute giving the text associated with each number, which are called levels in R. One can use as.numeric or unclass to get at the raw numbers. These can then be fed into algorithms which require numbers, like e.g. dist.
Note that the order in which numbers are associated with texts is pretty much arbitrary (in fact alphabetical), so the difference between numbers has no meaning in most applications. Therefore calling dist on this result is technically possible, but not neccessarily meaningful. For this reason, the author of this answer is not satisfied with it, even if the original poster seems to be happy about it. :-)
Also note that if there are different vectors, converting each separately will mean that the same number will represent different textual values and vice versa, unless both vectors are compromised from exactly the same set of distinct values. Additional care has to be taken if you want the same levels for both factors. One way would be to concatenate both vecotrs, turn that into a factor, and then split the result into two factor vectors.

Resources