Teradata Query To Solve Below Scenario - teradata

I have below scenario and want to write query to get the output.
Can you please help me with this.
Input :
Column1 Column2
X 1
X 2
X 3
Y 1
Y 2
Expected Output :
Column1 Column2
X 1,2,3
Y 1,2
Thanks.

Do like this:
SELECT Column1,TRIM(TRAILING ',' FROM (XMLAGG(Column2||',' ORDER BY Column2 ) (VARCHAR(100)))) Column2 FROM database.table
GROUP BY 1

Related

How to delete a number of rows after finding a specific value in R

I have a file in which the results of an analysis for a series of samples are exported one below the other. Only for some of the samples additional information that is not of interest is also exported.
This is the file structure:
col1 <- c('RESULTADOS','ID','result','RESULTADOS','ID','result','INFO ADICIONAL','ID','Extra','Extra2','RESULTADOS','ID','result')
col2 <- c('','1','f','','2','f','','2','q','w','','3','m')
df2 <- data.frame(col1,col2)
And this is the result I would like to get:
Datos <- c('ID','result','ID','result','ID','result')
col2 <- c('1','f','2','f','3','m')
df2 <- data.frame(Datos,col2)
I was wondering if there is any way in R to use a loop or some similar structure to iterate through the rows and, each time it finds a cell with "ADDITIONAL INFO" have it delete that row and the 3 following ones.
I tried filter(!Col1 c(“ADDITIONAL INFO”, Extra, Extra2)), but in that case I'm left with an ID hanging around, which I can't filter that way because I need that information in other sections.
Once those lines are discarded, I would need to be able to get it to take the value of col2 as ID when the data value is 'ID'. I tried a few ways but I just can't get to that result.
Any help would be much appreciated!
You can use data.table::shift():
library(data.table)
setDT(df2)
df2[col1 != "RESULTADOS" & !apply(df2[, shift(col1, 0:3)], 1, \(x) "INFO ADICIONAL" %in% x)]
Output:
col1 col2
1: ID 1
2: result f
3: ID 2
4: result f
5: ID 3
6: result m
You can do this:
borra <- which(col1 == "INFO ADICIONAL")
df2[-c(borra, borra+1, borra+2, borra+3),]
Output:
col1 col2
1 RESULTADOS
2 ID 1
3 result f
4 RESULTADOS
5 ID 2
6 result f
10 RESULTADOS
11 ID 3
12 result m

Column is being returned with negative sinal in the end of field in R

I'm trying to run this code in SQL Server in R Studio...
CASE
WHEN COLUMN1 LIKE '%-%'
THEN CAST((REPLACE(COLUMN1, '-', '') AS NUMERIC) * -1
ELSE COLUMN1
END VALUE
I'm using data table, cause my file there is 2 GB, I've tried to use this:
MYDATATABLE[, newfield:=ifelse((COLUMN1 %like% '-'), ***replace '-' for nothing and mutiply this for "-1"***, COLUMN1)]
The column is begin returned with the negative sign at the end of value - like this:
COLUMN1
--------
55.400-
60.440-
61.280-
136.400-
506.333-
If I understood your question correctly then you can try two step process.
Instead of slow performing ifelse you could first create a new column new_col if col1 has - by assigning the negative of col1's numeric value after removing - from col1. Then replace NA value in new_col with numeric value of col1.
library(data.table)
DT[grepl("-", col1),
new_col := - as.numeric(gsub("-", "", col1))][is.na(new_col), new_col := as.numeric(col1)]
which gives
> DT
col1 id new_col
1: -123 1 -123
2: 1233- 2 -1233
3: 45 3 45
Sample data:
DT <- data.table(col1 = c("-123","1233-", "45"),
id = c(1,2,3))
# col1 id
#1: -123 1
#2: 1233- 2
#3: 45 3

ColSum of Characters

I'm looking to create a total column that counts the number of cells in a particular row that contains a character value. The values will only be 1 of 3 different letters (R or B or D). So using the example from the script below, outcomes will be: p1=2, p2=1, p3=2, p4=1, p5=1
I'm thinking using nrow with a condition would be the way to go, but I'm unable to get that to work. Something like this:
# count the number of columns where the cell is not blank (or contains an R,B,D)
test$tot <- nrow(!test="")
Any help is much appreciated!
---------------- script for reference:
column1 <- c("p1","p2","p3","p4","p5")
column2 <- c("R","","R","R","")
column3 <- c("","B","","","B")
column4 <- c("D","","D","","")
test <- data.frame(column1,column2,column3,column4)
colnames(test)[c(1:4)] <- c("pol_nbr","r1","r2","r3")
View(test)
Using rowSums:
test$tot_cols <- rowSums(test[, -1] != "")
Converts factors to characters and use nzchar():
test <- data.frame(column1, column2, column3, column4, stringsAsFactors = FALSE)
test$ne <- rowSums(apply(test[-1], 2, nzchar))
test
column1 column2 column3 column4 ne
1 p1 R D 2
2 p2 B 1
3 p3 R D 2
4 p4 R 1
5 p5 B 1

How to print the column when another column met the condition

Example Data
a <- c(1,2,2,3)
b <- c(1,2,3,4)
dat <- data.frame(a,b)
I would like to print the column 2 when any data from the column 1 is >=2
which(dat[,1]>=2)
This only show which row of column2 is greater than 2.
I expect it will show:
[1] 2 3 4
Sorry for my bad English and hope you can understand it.
If we need the corresponding values in 2nd column, use the [
dat[,2][dat[,1]>=2]
#[1] 2 3 4

Select most frequent elements by a variable

If I have a data frame that looks like this:
id=c(1,1,1,2,2,3,3,3,3)
ans=c(1,1,3,3,3,4,3,1,4)
d=cbind(id,ans)
How can I select the most frequent answer per ID?
I would like to return a data frame that looks like this:
id=c(1,2,3)
ans=c(1,3,4)
d.out=cbind(id,ans)
You need a 2-way table, and then find the max count in each row:
tab <- table(id, ans)
data.frame(id=rownames(tab), ans=colnames(tab)[max.col(tab)])
What about this?
res <- sapply(split(ans, id), function(x) names(sort(table(x),decreasing=TRUE)[1]))
data.frame(id = names(res), ans = res)
id ans
1 1 1
2 2 3
3 3 4

Resources