DolphinDB: Most efficient way to combine a STRING and a FLOAT vector as a row of table? - vector

I want to combine a STRING scalar and a FLOAT vector and convert to a row of table (which is of [STRING, FLOAT, ...,FLOAT] type).
Suppose the variables are defined as follows:
str="a"
v=[0.3, 0.6, 0.7, 0.8]
I expect the output to be:
strCol col0 col1 col2 col3
a 0.3 0.6 0.7 0.8

The proposed script can be applied to vectors of any length:
str="a"
v=[0.3, 0.6, 0.7, 0.8]
t = table(matrix(v).transpose())
update t set strCol = str
reorderColumns!(t,`strCol)
Check table t:
strCol col0 col1 col2 col3
a 0.3 0.6 0.7 0.8

Related

Kusto calculation with Loop

I am trying to make an iterative calculation, but it seems that it is not possible, has someone any clue if there is a workaround?
How my Table looks like:
Column1
Column2
Todo
A
B
0.5
A
C
-0.3
A
C
-0.3
What I want to see:
Column1
Column2
Todo
Calculated
A
B
0.5
1.0
A
C
-0.3
0.7
A
C
-0.3
0.6
Starting variable is 0.5, it would add 0.5 in the first row. In the second row it would substract the result from the first row. If the calculation is below Zero it has to set the result to 0.0
Would be great the have help here.
Thanks in advance
Based on what I understood, it looks like you're trying to get the cumulative sum of a column. You could use row_cumsum() after setting the right sort order:
let T = datatable(Column1:string, Column2:string, Todo:double)
[
"A", "B", 0.5,
"A", "C", -0.3,
"A", "C", -0.3,
];
T
| sort by Column1
| serialize Calculated = row_cumsum(Todo) + 0.5
I didn't get how the last row in your example ended up as 0.7 - 0.3 = 0.6? Shouldn't it be 0.4?

How to assign weights to strings in WHERE clause?

The table is
id col1 col2
1 former good
2 future fair
3 now bad
4 former good
.............
GOAL : I need to SELECT only those rows that have a cumulative score higher than 0.8
1) If col1 = 'former' THEN the row gets 0.2 points, if 'now' THEN '0.7' , if 'future' THEN 0.3
2) If col2 = 'good' THEN the row gets 0.8 points, if 'bad' THEN '0.1' , if 'fair' THEN 0.5
Therefore I need to I need to assign numeric values in the WHERE clause. I want to avoid changing values in the SELECT because I need the user to be able to see the labels ('good', 'now' etc) but not numbers.
How can I do this?
SELECT *
FROM mytable
WHERE ?
Use a CASE to assign a weight based on your logic:
WHERE
CASE col1
WHEN 'former' THEN 0.2
WHEN 'now' THEN 0.7
WHEN 'future' THEN 0.3
ELSE 0
END +
CASE col2
WHEN 'good' THEN 0.8
WHEN 'bad' THEN 0.1
WHEN 'fair' THEN 0.5
ELSE 0
END > 0.8
SELECT * FROM myTable where col1 + col2 > 0.8
But provide us the real structure of the table.

Interleave two columns of a data.frame

I have a data frame like this:
GN SN
a 0.1
b 0.2
c 0.3
d 0.4
e 0.4
f 0.5
I would like the following output:
GN
a
0.1
b
0.2
c
0.3
Can anyone help me? How to "interleave" the elements of the second column to the elements of the first column to gain the desired output?
First let's create some data:
dd = data.frame(x = 1:10, y = LETTERS[1:10])
Next, we need to make sure the y column is a character and not a factor (otherwise, it will be converted to a numeric)
dd$y = as.character(dd$y)
Then we transpose the data frame and convert to a vector:
as.vector(t(dd))
However, a more pertinent question is why you would want to do this.

R populating a vector [duplicate]

This question already has answers here:
R fill vector efficiently
(4 answers)
Closed 6 years ago.
I have a vector of zeros, say of length 10. So
v = rep(0,10)
I want to populate some values of the vector, on the basis of a set of indexes in v1 and another vector v2 that actually has the values in sequence. So another vector v1 has the indexes say
v1 = c(1,2,3,7,8,9)
and
v2 = c(0.1,0.3,0.4,0.5,0.1,0.9)
In the end I want
v = c(0.1,0.3,0.4,0,0,0,0.5,0.1,0.9,0)
So the indexes in v1 got mapped from v2 and the remaining ones were 0. I can obviously write a for loop but thats taking too long in R, owing to the length of the actual matrices. Any simple way to do this?
You can assign it this way:
v[v1] = v2
For example:
> v = rep(0,10)
> v1 = c(1,2,3,7,8,9)
> v2 = c(0.1,0.3,0.4,0.5,0.1,0.9)
> v[v1] = v2
> v
[1] 0.1 0.3 0.4 0.0 0.0 0.0 0.5 0.1 0.9 0.0
You can also do it with replace
v = rep(0,10)
v1 = c(1,2,3,7,8,9)
v2 = c(0.1,0.3,0.4,0.5,0.1,0.9)
replace(v, v1, v2)
[1] 0.1 0.3 0.4 0.0 0.0 0.0 0.5 0.1 0.9 0.0
See ?replace for details.

How to summarize multiple files into one file based on an assigned rule?

I have ~ 100 files in the following format, each file has its own file name, but all these files are save in the same directory, let's said, filecd is follows:
A B C D
ab 0.3 0.0 0.2 0.20
cd 0.7 0.0 0.3 0.77
ef 0.8 0.1 0.5 0.91
gh 0.3 0.5 0.6 0.78
fileabb is as follows:
A B C D
ab 0.3 0.9 1.0 0.20
gh 0.3 0.5 0.6 0.9
All these files have same number of columns but different number of rows.
For each file I want to summarize them as one row (0 for all cells in the same column are < 0.8; 1 for ANY of the cells in the same column is larger than or equal to 0.8), and the summerized results will be saved in a separate csv file as follows:
A B C D
filecd 1 0 0 1
fileabb 0 1 1 1
..... till 100
Instead of reading files and processing each files separately, could it be done by R efficiently? Could you give me help on how to do so? Thanks.
For the ease of discussion. I have add following lines for sample input files:
file1 <- data.frame(A=c(0.3, 0.7, 0.8, 0.3), B=c(0,0,0.1,0.5), C=c(0.2,0.3,0.5,0.6), D=c(0.2,0.77,0.91, 0.78))
file2 <- data.frame(A=c(0.3, 0.3), B=c(0.9,0.5), C=c(1,0.6), D=c(0.2,0.9))
Please kindly give me some more advice. Many thanks.
First make a vector of all the filenames.
filenames <- dir(your_data_dir) #you may also need the pattern argument
Then read the data into a list of data frames.
data_list <- lapply(filenames, function(fn) as.matrix(read.delim(fn)))
#maybe with other arguments passed to read.delim
Now calculate the summary.
summarised <- lapply(data_list, function(dfr)
{
apply(x, 2, function(row) any(row >= 0.8))
})
Convert this list into a matrix.
summary_matrix <- do.call(rbind, summarised)
Make the rownames match the file.
rownames(summary_matrix) <- filenames
Now write out to CSV.
write.csv(summary_matrix, "my_summary_matrix.csv")

Resources