I have a problem with data table in r. I have created a data table (approx. 1x60 columns) by using "sample" function. (eg column 1 <- sample(data, 1) and so on. I have sampled over two (yes/no) or five values (a/b/c/d/e). So I ended up with 1 row data table with 60 columns, each column contain 'yes', 'no', and a/b/c/d or e value. The problem is that I have to have all combinations. I have tried 'expand grid' function, however, I stuck with 1 million rows, so I have to have more control. Is there any possibility to add another empty row to existing data table and fill that row with the remaining possibilities, then add third row and repeat? I mean: if in 1st column there is 'yes' value in the 1st row, there should be 'no' value in the 2nd row and so on. Please let me know if you have any idea what function could I use. I have spend many hours looking for some answer. Thanks a lot for your help.
Related
I have a messy dataset with multiple entries in some cells. The numbers in paranthesis refer to the specific columns "(1)", "(2)", and "(3)". In this example
multiple entries in cell 30 refers to column (2) and 20 refers to column (1). No information for column (3).
I would like to split up/extract the values in the cells and create 3 additional columns.
Several hundred cells are affected in several columns.
Dataset
In the end I would like to have 3 new columns for each column affected. Any idea how I do that? I'm still a rookie so help is much appreciated!
I have a dataframe with multiple columns and I want to apply different functions on each column.
An example of my dataset -
I want to calculate the count of column pq110a for each country mentioned in qcountry2 column(me-mexico,br-brazil,ar-argentina). The problem I face here is that I have to use filter on these columns for example for sample patients I want-
Count of pq110 when the values are 1 and 2 (for some patients)
Count of pq110 when the value is 3 (for another patients)
Similarly when the value is 6.
For total patient I want-total count of pq110.
Output I am expecting is-Output
Similalry for each country I want this output.
Please suggest how can I do this for other columns also,countrywise.
Thanks !!
I guess what you want to do is count the number of columns of 'pq110' which have the same value within different 'qcountry2'.
So I'll try to use 'tapply' to divide data into several subsets and then use 'table' to count column number for each different value.
tapply(my_data[,"pq110"], INDEX = as.factor(my_data[,"qcountry2"]), function(x)table(x))
I have a datatable where one of the columns should be expressed in dollars and some as percentages. I've been looking around and I'm still not sure how to do it - seems like it would be easy?
The trickier part is I have another data table where only certain entries need to be expressed as dollars (i.e. not whole rows or whole columns) - is there a way to handle this?
Imagine your datatable (myData) is 2 columns by 10 rows.
You want the second row to be in dollars:
myData[,2]<-sapply(myData[,2],function(x) paste0("$",x))
Or, you want rows 6 to 10 in the first column to be percentages:
myData[6:10,1]<-sapply(myData[6:10,1],function(x) paste0(x,"%"))
Or, you want rows 1 to 5 in the second column to be in dollars, you can do:
myData[1:5,2]<-sapply(myData[1:5,2],function(x) paste0("$",x))
I would like to divide every number in all columns by 1000. I would like to omit the row header and the 1st column from this function.
I have tried this code:
TEST2=(TEST[2:503,]/(1000))
But it is not what I am looking for. My dataframe has 503 columns.
Is TEST a dataframe? In that case, the row header won't be divided by 1000. To choose all columns except the first, use an index in j to select all columns but the first? e.g.
TEST[, 2:ncol(TEST)]/1000 # selects every row and 2nd to last columns
# same thing
TEST[, -1]/1000 # selects every row and every but the 1st column
Or you can select columns by name, etc (you select columns just like how you are selecting rows at the moment).
Probably take a look at ?'[' to learn how to select particular rows and columns.
I need to extract the columns from a dataset without header names.
I have a ~10000 x 3 data set and I need to plot the first column against the second two.
I know how to do it when the columns have names ~ plot(data$V1, data$V2) but in this case they do not. How do I access each column individually when they do not have names?
Thanks
Why not give them sensible names?
names(data)=c("This","That","Other")
plot(data$This,data$That)
That's a better solution than using the column number, since names are meaningful and if your data changes to have a different number of columns your code may break in several places. Give your data the correct names and as long as you always refer to data$This then your code will work.
I usually select columns by their position in the matrix/data frame.
e.g.
dataset[,4] to select the 4th column.
The 1st number in brackets refers to rows, the second to columns. Here, I didn't use a "1st number" so all rows of column 4 are selected, i.e., the whole column.
This is easy to remember since it stems from matrix calculations. E.g., a 4x3 dimensional matrix has 4 rows and 3 columns. Thus when I want to select the 1st row of the third column, I could do something like matrix[1,3]