How to calculate combination of Data frame in R - r

I am a beginner in R program.
I imported a csv file. This file only contains one column with 50 characters, but R classifies it as a dataframe. I need all possible combinations within elements of this column. I think I need to work with a vector not with a data frame, how can I do it?
Thank you!

Actually your data frame already contains the vector you need. You can call it with
dataframe$column_name
The text before the $ operator specifies your data frame, and after is your vector, which is a column in your data frame. So when you run your calculations you can just write
function(dataframe$column_name)
In your specific case with a single vector, it may be simplest to change the dataframe into a 2d vector. But when you start manipulating your data, you'll likely store more vectors of variables. You'll want to keep those vectors organized within data frames.

Do you mean unlist?
You can use it to change a data frame into a vector, then you can use combn to get combination.

Related

R function for identifying values from one column in another?

I have two different data frames, each of them consisting of a list of "genes" and a list of "interactors" (other genes). Is it possible with R to check if there any "genes" from one list that are also present in any of the columns of "interactors" from the other data frame, and vice-versa?
I am quite new in R, so perhaps there is an easy way to perform this, but I don't even know how to look for it.
Thanks in advance!
Guillermo.
please can you show a sample of your data?
In any case, I guess the following is what you need:
df_common<-data.frame(df[which(df$genes %in% df$interactors),])
it is checking which elements in the column "genes" in the data frame df are also present %in% the column "interactors" in the same data frame
Is it this what you are looking for? if not, please paste input and desired output

How to convert a column with a for loop and grep expressions?

I have a dataset of airbnb and one of the variables is amenities. The “amenities” column lists all the amenities provided by the host. What’s the total number of amenities offered? Convert this to a numeric value that indicates the number of amenities provided. For example, if an instance of “amenities” is {TV,Internet,Wifi,Washer}, it should convert to 4. Add this as a column to the dataframe. I am very confused on how to do this. Some of the amenities go up to 50 different amenities. Manually making vector would take forever.
I'm also confused on this as well for the airbnb dataset. Before we do any further analysis involving calculations, we should first clean the data for mathematical operations. For example, the character “$” appears in the “price” column, making the data type of “price” character instead of numeric. Remove the “$” and “,” in this column and convert the data type as numeric (modify the raw data). I believe I have to use grep expressions.
if you have that info on a data frame you should try to use strsplit function:
sapply(strsplit(data.frame$amenities,","),length)
for subtitution of characters try gsub function

Why does R think my imported vector of characters are numbers?

This is probably a basic question, but why does R think my vector, which has a bunch of words in it, are numbers when I try to use these vectors as column names?
I imported a data set and it turns out the first row of data are the column headers that I want. The column headers that came with the data set are wrong ones. So I want to replace the column names. I figured this should be easy.
So what I did was I extracted the first row of data into a new object:
names <- data[1,]
Then I deleted the first row of data:
data <- data[-1,]
Then I tried to rename the column headers with the "names" object:
colnames(data) <- names
However, when I do this, instead of changing my column names to the words within the names object, it turns it into a bunch of numbers. I have no idea where these numbers come from.
Thanks
You need to actually show us the data, and the read.csv()/read.table() command you used to import.
If R thinks your numeric column is string, it sounds like that's because it wrongly includes the column name, i.e. you omitted header=TRUE in your read.csv()/read.table() import.
But show us your actual data and commands used.

Transform Dataframe to vector in R

I am attempting to pull a table from SQLServer and convert it to a vector in R.
I use sqlQuery() to return the table, which looks to be returned as a dataframe. I am curious, can I change all the values in this dataframe to be a vector?
I am currently using as.vector(nameofdataframe), which converts it to a list. I find that if I use as.vector(dataframe$column), it returns a vector, but I have many columns and I feel like there should be a much more simple way.
I was able to figure it out. If you take the data frame resulting from a sqlQuery() you need to use as.matrix first and then as.vector to the resulting matrix. Thank you all for your help.

What's the easiest way to ignore one row of data when creating a histogram in R?

I have this csv with 4000+ entries and I am trying to create a histogram of one of the variables. Because of the way the data was collected, there was a possibility that if data was uncollectable for that entry, it was coded as a period (.). I still want to create a histogram and just ignore that specific entry.
What would be the best or easiest way to go about this?
I tried making it so that the histogram would only use the data for every entry except the one with the period by doing
newlist <- data1$var[1:3722]+data1$var[3724:4282]
where 3723 is the entry with the period, but R said that + is not meaningful for factors. I'm not sure if I went about this the right way, my intention was to create a vector or list or table conjoining those two subsets above into one bigger list called newlist.
Your problem is deeper that you realize. When R read in the data and saw the lone . it interpreted that column as a factor (categorical variable).
You need to either convert the factor back to a numeric variable (this is FAQ 7.10) or reread the data forcing it to read that column as numeric, if you are using read.table or one of the functions that calls read.table then you can set the colClasses argument to specify a numeric column.
Once the column of data is a numeric variable then a negative subscript or !is.na will work (or some functions will automatically ignore the missing value).

Resources