I have a column whose values are supposed to be 11 numbers, however, some of them have less than 11 numbers. How can I write a query that will return all the values in this column that have less than 11 numbers. I am working in Teradata
Related
I have a CSV file with 11 columns (but the first 8 I am ignoring for now), last 3 (9 - 11) are important. I am missing some data for column 9, and these cells show up as NA. But to fill in these cells, I can multiply column 11 by column 10.
I want to create a data frame where all of column 9 is filled in and save that as a new CSV file. I first tried to multiply the columns. This worked and I got the missing data from column 9. Then I tried to merge the new column 9 with the column 9 from my data frame but R just attached the 2 columns together.
I would like for the NA data that has been calculated to replace the data in the original data frame (so I end up with a full column 9). Plus, I would like to only multiply the columns with NA cells so that no original data is replaced. How to do that?
col_9 <- matrix(dat[,10] * dat[,11], ncol=1)
print(col_9)
You can use ifelse function:
dat[,9]=ifelse(is.na(dat[,9]), dat[,10]*dat[,11], dat[,9])
If the condition is TRUE (i.e. is.na(dat[,9])), the value will be replaced by the second argument (dat[,10]*dat[,11]), otherwise it is replaced by the third on (i.e. dat[,9], so the value is kept).
I have a dataframe with 21 columns, columns 4 on wards are pairs of values (numerator and denominator) I want to divide the two and place into the first column, i.e. i want column 4 to become the result of column 4 divided by column 5, then i want column 6 to be the result of column 6 divided by 7 and so on.
I know (or at least can find on google) how to do this easily enough with reference to the column names, but I would prefer not to use these and rather refer to the column index.
It can be done by dividing equal sized datasets. In the numerator, we have the columns starting from 4 till the one before the last column and in denominator, subset from 5th to the last column, update the results by assigning it to the numerator column index subset
df1[4:(ncol(df1)-1)] <- df1[4:(ncol(df1)-1)]/df1[5:ncol(df1)]
NOTE: Assuming the columns are numeric classs
I have a data set that contains a column with strings made up of 4 letters (A,T,C,G); these strings range from 2-1991 characters long. I would like to subset all rows where the strings match a particular pattern. For example, I would like to create a new dataframe that subsets all rows where there are 0-10 consecutive Ts in column 17.
Please let me know if you require additional information and thank you for your time!
You could filter out all rows where you find 11 consecutive Ts, which would include rows that have 11 consecutive Ts, and rows that have more.
## Example vector
v = c("TTTTTTTTTTACAGATAT","TTTACACAC","TTTTTTTTTTTTTACAGAT","TTTTTTTTTTTACAG")
v[!grepl("T{11}",v)]
[1] "TTTTTTTTTTACAGATAT" "TTTACACAC"
Edit to also include cases where you want to look for 11-20 consecutive Ts
If you want to select rows that have between 11 and 20 Ts, you could use a negative lookbehind and a negative lookahead, to search for a stretch of between 11 and 20 Ts that is neither preceded nor followed by a T.
## Second example vector:
v2 = c("TTTTTTTTTTACAGATAT","TTTACACAC","TTTTTTTTTTTTTACAGAT","TTTTTTTTTTTACAG","ACTTTTTTTTTTTTTTTTTTTTTGCGCA")
v2[grepl("(?<!T)T{11,20}(?!T)",v2,perl=T)]
[1] "TTTTTTTTTTTTTACAGAT" "TTTTTTTTTTTACAG"
I looked everywhere but did not find answer to my question. I am having trouble with makig contingency table. I have data with many columns, let say 1, 2 and 3. In the first column there are let say 100 different values, in the second 20 and the third column has 2 possible values: 0 and 1. First I take just data with value 1 in column 3 (data<-data[Column3==1,]). Now I have only around 20 different values in 1. column and 5 in 2. column. However when I do a contingency table its size is 100x20, not 20x5, and contains a lot of zeros (they correspond to combination of column1 and column2 which has value 0 in column3). I would be greatful for every kind of help, thanks.
I guess all your three variables are factors.So convert them into character using
as.character()
to all three variables then apply
table()
for that.
Say I have a data.frame of arbitrary dimensions (n by p). I want to extract a vector of length n from that data.frame, one element in the vector per row in the data.frame. However, the column in which each element lies may vary by row. Is there a way to do this without loops?
For example, if I have the following (3x3) data frame, called say DATA
X Y Z
1 17 43
3 4 2
6 9 0
I want to extract one scalar value from DATA per row. I have a vector, call it column.list, c(1,3,1) (arbitrarily selected in this case) which gives the column index for the elements I want, where the kth element of column.list is the column index for row k in DATA. How do I do this without loops? I want to avoid loops because I am using this repeatedly in a simulation study that will take a lot of running time even without loops, and the row number might be 100,000 or so. Much appreciated!
You can do this by indexing your data.frame with a matrix. The first column indicates row, the second indicates column. So if you do
column.list <- c(1,3,1)
DATA[cbind(1:nrow(DATA), column.list)]
You will get
[1] 1 2 6
as desired. If you mix across columns of different classes, all the variable will be coerced to the most accommodating data type.