Column sum in torch - torch

How do i sum along a column in torch? I have a 128*1024 tensor, and I want to get a 1*1024 tensor by summing all the rows.
For example:
a:
1 2 3
4 5 6
I want b
5 7 9

To do this, you can use the sum method.
torch.sum(a,1)
In general, you can specify any axis you want to sum over.
torch.sum(a,axis)
(To sum over rows, you can use axis=2)

Related

can I select some rows in my data set whose have the same value in 2 of the columns?

I have a data set with 40 columns and 2000 rows. the value of 2 columns are important. I want to select rows whose have the same value in these 2 columns.
a small sample of my data is like this
2 3 4 5 6 3 23 32
4 3 4 1 0 5 6 43
4 4 3 22 1 2 23
Suppose I want to select rows whose have same value in first and third columns. So I want the second row to be stored in a new data set
I take from your comments that you have numbers stored as factors in that dataframe. Factors have different internal values. So when the console output shows the factor level to be 4 it is not necessarily a 4 in the internal representation. In general, two different factors are not compatible with each other except if they have the same level set. To see the 'internal representation' of your first column use as.numeric(df[[1]]).
Now to the solution of your problem. You first have to convert the factors in your columns 1 and 3 (or all columns) into numeric values using the factor levels. Instructions for that can be found here.
## converting factor levels to numeric values
df[[1]] <- as.numeric(levels(df[[1]]))[df[[1]]]
df[[3]] <- as.numeric(levels(df[[3]]))[df[[3]]]
## filter data
df[df[1] == df[3],]

R: Sample unique X, Y pairs without duplication of Xs and Ys

I am having a hard time figuring how to program this in R: Given a number of X and Y pairs, such as
X Y
9 1
1 2
12 3
8 4
9 4
4 5
16 6
18 7
5 8
11 9
4 10
6 11
6 12
14 13
18 13
20 13
13 14
20 15
20 16
I need to sample randomly n pairs that fulfil the condition that Xs and Ys are unique. For instance, if n=3 and using the data above, the following combinations (9,1) (4,5) (4,10) or (1,2) (14,13) (20,13) will be invalid because X=4 or Y=13 are duplicated in each of the solutions. However, (9,1) (1,2) and (8,4) will be a valid solution because the Xs and Ys are unique. Any help will be moooooost welcome.
If you start by sampling (randomizing) the rows of your original data, then subset only those rows where X or Y are not duplicated and then select the first, last or any n (=3) number of rows (you could use sample again), you should be fine, I think.
set.seed(1) # for reproducibility
head(subset(df[sample(nrow(df)),], !duplicated(X) & !duplicated(Y)), 3)
# X Y
#6 4 5
#7 16 6
#10 11 9
In response to the comment by #Richo64, saying that this approach will not randomly select the pairs:
It does sample the pairs randomly because the first (inner most) thing I do is
df[sample(nrow(df)),]
which samples the rows of the data randomly. Now, once we have done that, it's a random process which, say 4, in column X will come first and therefore will remain in the data because the other 4 is removed since it is a duplicated entry in X.
The same applies to values in Y.
It's obvious then, that after the sampling and subsetting, you are free to choose any 3 rows of the remaining data and even if you always selected the first 3 rows, you would still get a random selection that will differ every time you run it (except when it coincidently samples the same rows again).

Efficient method of obtaining successive high values of data.frame column

Lets say I have the following data.frame in R
df <- data.frame(order=(1:10),value=c(1,7,3,5,9,2,9,10,2,3))
Other than looping through data an testing whether value exceeds previous high value how can I get successive high values so that I can end up with a table like this
order value
1 1
2 7
5 9
8 10
TIA
Here's one option, if I understood the question correct:
df[df$value > cummax(c(-Inf, head(df$value, -1))),]
# order value
#1 1 1
#2 2 7
#5 5 9
#8 8 10
I use cummax to keep track of the maximum of column "value" and compare it (the previous row's cummax) to each "value" entry. To make sure the first entry is also selected, I start by "-Inf".
"get successive high values (of value?)" is unclear.
It seems you want to filter only rows whose value is higher than previous max.
First, we reorder your df in increasing order of value... (not clear but I think that's what you wanted)
Then we use logical indexing with diff()>0 to only include strictly-increasing rows:
rdf <- df[order(df$value),]
rdf[ diff(rdf$value)>0, ]
order value
1 1 1
9 9 2
10 10 3
4 4 5
2 2 7
7 7 9
8 8 10

How to get the square of a number for each cell in data frame in R

I have data frame in R with 3 columns A,B and C
A B C
2 3 4
5 2 7
I want to get square of each number like this
A B C
4 9 16
25 4 49
Can anyone please help me out. I can able to make in excel but want to do in R
just do this. In R ^ will take care whether it is a number,vector,matrix or dataframe..
dataframe^2
If you want your result as a data.frame rather than a matrix, do
data.frame(dataframe^2)

Re-sample a data frame with panel dimension

I have a data set consisting of 2000 individuals. For each individual, i:2000 , the data set contains n repeated situations. Letting d denote this data set, each row of dis indexed by i and n. Among other variables, d has a variable pid which takes on identical value for an individual across different (situations) rows.
Taking into consideration the panel nature of the data, I want to re-sample d (as in bootstrap):
with replacement,
store each re-sample data as a data frame
I considered using the sample function but could not make it work. I am a new user of r and have no programming skills.
The data set consists of many variables, but all the variables have numeric values. The data set is as follows.
pid x y z
1 10 2 -5
1 12 3 -4.5
1 14 4 -4
1 16 5 -3.5
1 18 6 -3
1 20 7 -2.5
2 22 8 -2
2 24 9 -1.5
2 26 10 -1
2 28 11 -0.5
2 30 12 0
2 32 13 0.5
The first six rows are for the first person, for which pid=1, and the next sex rows, pid=2 are different observations for the second person.
This should work for you:
z <- replicate(100,
d[d$pid %in% sample(unique(d$pid), 2000, replace=TRUE),],
simplify = FALSE)
The result z will be a list of dataframes you can do whatever with.
EDIT: this is a little wordy, but will deal with duplicated rows. replicate has its obvious use of performing a set operation a given number of times (in the example below, 4). I then sample the unique values of pid (in this case 3 of those values, with replacement) and extract the rows of d corresponding to each sampled value. The combination of a do.call to rbind and lapply deal with the duplicates that are not handled well by the above code. Thus, instead of generating dataframes with potentially different lengths, this code generates a dataframe for each sampled pid and then uses do.call("rbind",...) to stick them back together within each iteration of replicate.
z <- replicate(4, do.call("rbind", lapply(sample(unique(d$pid),3,replace=TRUE),
function(x) d[d$pid==x,])),
simplify=FALSE)

Resources