creating variable from a matrix R - r

i'm tryng to create a set of variables from a matrix, this is my code
matrix<-cbind(paste("a",letters[1:11],sep=""),
paste("b",letters[1:11],sep=""),
paste("c",letters[1:11],sep=""),
paste("d",letters[1:11],sep=""),
paste("e",letters[1:11],sep=""),
paste("f",letters[1:11],sep=""),
paste("g",letters[1:11],sep=""),
paste("h",letters[1:11],sep=""),
paste("i",letters[1:11],sep=""),
paste("j",letters[1:11],sep=""),
paste("k",letters[1:11],sep=""))
so i've got a matrix with all the combination between the letters, aa, ab, ac and so on;
what can I do if i want create variables with the same name and assign a value of each?
for example
aa<-0
ab<-0
and so on; is there a method to do automatically?thanks

Consider this alternate strategy:
> m <- matrix(NA, 10, 10, dimnames=list(letters[1:10], letters[1:10]) )
> m[] <- outer(1:10, 1:10, FUN="-")
> m
a b c d e f g h i j
a 0 -1 -2 -3 -4 -5 -6 -7 -8 -9
b 1 0 -1 -2 -3 -4 -5 -6 -7 -8
c 2 1 0 -1 -2 -3 -4 -5 -6 -7
d 3 2 1 0 -1 -2 -3 -4 -5 -6
e 4 3 2 1 0 -1 -2 -3 -4 -5
f 5 4 3 2 1 0 -1 -2 -3 -4
g 6 5 4 3 2 1 0 -1 -2 -3
h 7 6 5 4 3 2 1 0 -1 -2
i 8 7 6 5 4 3 2 1 0 -1
j 9 8 7 6 5 4 3 2 1 0
Now you can access a single element with a letter pair:
m['d','f']
[1] -2

Related

Trying to fit sentiment score in time series analysis in R

I have extracted tweets from Twitter about Bitcoin. I have a total of 4383186 tweets extracted and after using sentiment analysis and taking the first difference(for stationarity purposes), I am left with a total of 2623105. So, when I am trying to run the following code:
First_diff$Sentiment_score
result:
[1] -2 1 -1 2 0 0 -1 1 0 -1 -1 -5 5 -2 3 1 -6 5 0 -3 5 -2 0 0
[25] -4 1 2 1 3 -7 6 -3 -2 -1 2 2 1 -4 1 -2 3 1 0 3 -3 -1 3 -8
[49] 6 0 -1 -2 7 -3 0 -2 2 -4 3 -1 -2 1 5 -7 3 1 -1 0 5 -5 1 -1
[73] 0 -2 0 -1 4 1 -1 -1 3 -2 -6 6 -1 2 1 -8 1 2 -3 6 1 1 3 -7
[97] 2 7 -10 5 -4 -1 -3 11 -6 0 8 -7 2 -2 -3 3 5 0 -5 5 -5 0 -2 1
[121] 6 -6 4 -4 4 -6 4 1 -6 4 -3 2 3 -8 5 0 3 0 -6 3 2 -1 0 0
[145] 3 -7 4 -3 3 0 -1 -4 10 -6 6 -5 -1 1 0 -1 0 0 0 0 1 1 0 -5
[169] 4 1 2 -6 0 5 -3 -4 5 1 1 -2 -1 0 -2 2 6 -6 0 5 -3 -4 2 -2
[193] -3 5 0 0 -2 4 0 -2 0 -3 5 -6 4 8 -6 -1 0 -1 -5 4 1 -2 4 -1
[217] 0 0 0 -1 1 7 -8 -4 4 3 -6 2 -4 5 2 0 -2 0 4 -3 1 -2 1 -1
[241] -2 -1 4 0 -1 1 1 -4 2 -3 7 -4 -2 3 5 -6 1 0 0 -1 0 0 0 -2
[265] 3 -3 2 -2 2 4 -4 3 -3 1 -5 5 0 4 -3 -1 -3 5 -2 0 2 -7 2 2
[289] -2 7 -5 -3 4 -2 2 -1 4 -4 2 -2 4 -7 3 3 -2 -1 1 -6 4 1 -2 -1
[313] 3 -3 4 -3 0 -3 4 1 3 -4 2 -1 2 -5 0 4 -4 0 2 -1 -1 4 -6 6
[337] -1 1 -1 -2 2 -2 2 -1 -2 2 1 -3 3 -3 3 1 0 -1 1 0 -1 3 -4 2
[361] -2 0 -4 3 4 -2 2 -4 2 -2 2 0 0 6 -5 -1 -5 6 0 -1 -1 2 1 -3
[385] 1 2 -2 1 2 -2 -5 4 0 -2 -3 4 -1 -1 8 -10 3 0 2 1 -3 1 -1 3
[409] -5 -1 3 2 0 5 -5 1 -3 -3 0 5 2 0 -4 3 0 -6 3 6 -5 -2 2 4
[433] -3 -3 -1 1 4 -6 6 0 2 -2 -1 -3 5 -2 -2 3 -2 -1 1 -1 -2 4 5 -5
The above is just a small sample of the whole data
head(First_diff$Sentiment_score)
result:
[1] -2 1 -1 2 0 0
Which is weird? Since I have more than 2 million observations.
Nevertheless, I tried plotting the following in a timeseries graph with the following code, however this graph gives me back a graph with a black square all over the graph.
ts_tweet <- ts(First_diff$Sentiment_score, frequency = ..)
plot.ts(ts_tweet)
I used several different numbers in the frequency tab, but was not getting anything better. What am I doing wrong or am I forgetting to do? The data of the First_diff is a df with sentiment score column and date column. The date column goes from 2021-02-05 10:52:04 up to 2022-10-15 23:59:59.

in R: how to take value from i+1th row of 1 dataframe and subtract from every row in i+1th column of 2nd dataframe

Note that the actual dataset is 1000s of columns and 100s of rows so I am looking for a way that does not require that i manually name either columns or rows.
With a dataset that has similar structure as follows:
subvalues <- c(1:10)
df <- data.frame(x = rpois(40,2), y = rpois(40,2), z = rpois(40,2), q = rpois(40,2), t = rpois(40,2))
call the rows of subvalues SVa, SVb, SVc...
call the rows of the dataframe's columns Xa, Xb, Xc... Ya, Yb, Yc... etc.
What I am trying to build is the following: A function that takes first the first cell of subvalues (SVa) and subtracts it from every row in column X (Xa, Xb, Xc, etc.), 2nd to take the 2nd cell of subvalues (SVb) and subtract it from every row in column y (Ya, Yb, Yc, etc.)
What I have so far is:
res <- numeric(length = length(x))
for (i in seq_along(x)) {
res[i] <- xpos - [**SVi+1**]
}
res
I need to figure out the 'SVi+1' loop and how to properly do the loop-within a loop.
Any help is much appreciated
The example dataset you provide won't work, because you need the same length for subvalues and the number of df columns.
After some modifications, here is an example. You don't need to extract the value from subvalues, as it's just a substraction.
Note that I've saved df in tmp, to modify this data.frame without loosing your initial data. Also, if the entire data.frame is numeric, consider using matrix, which can save you time.
subvalues <- c(1:5) # Note here the length 5 for the 5 columns of df.
df <- data.frame(x = rpois(40,2), y = rpois(40,2), z = rpois(40,2), q = rpois(40,2), t = rpois(40,2))
tmp <- df
for(i in seq_along(subvalues)){
# print(subvalues[i])
tmp[,i] <- tmp[,i] - subvalues[i]
}
tmp[,i] is a vector returning the i column of the data.frame, and so you can substract a value to a vector, and save it in it's initial place.
Maybe you can try replicate to create a matrix of same dimensions as df, and do subtraction afterwards, i.e.,
dfout <- df - t(replicate(nrow(df),subvalues))
such that
> dfout
x y z q t
1 0 1 -1 2 -4
2 0 0 0 -2 -1
3 1 1 -2 -2 -3
4 3 0 -2 -3 -2
5 0 0 0 -1 -1
6 3 1 -2 -2 -3
7 3 -2 0 -2 -5
8 1 0 -3 -3 -4
9 1 1 -2 -3 -2
10 -1 1 -2 -2 -4
11 0 0 -2 -2 -3
12 0 2 -3 -4 -2
13 2 0 -1 -4 -2
14 0 -1 1 -2 -4
15 2 -2 0 0 -4
16 1 -2 0 -2 -1
17 2 -1 -1 -2 -3
18 5 0 -1 -2 -2
19 0 0 0 2 -3
20 2 0 -1 -2 -1
21 3 2 -1 -1 -4
22 0 -1 -2 -2 -4
23 1 0 -2 -3 -1
24 -1 -1 3 -3 -3
25 0 0 -1 -1 -1
26 0 -1 -2 -2 -4
27 -1 0 -3 -3 -2
28 0 1 -1 -1 -2
29 3 -2 1 -4 -1
30 0 2 -1 0 -3
31 1 -1 2 -2 -2
32 1 1 0 -2 -4
33 1 -1 -2 -3 -5
34 0 -1 -1 -2 -1
35 2 0 -2 -2 -4
36 1 2 -3 -3 -3
37 2 2 0 -2 -5
38 -1 -1 -3 -4 -2
39 2 1 -1 -3 -4
40 1 3 -1 -3 -2
DATA
set.seed(1)
subvalues <- c(1:5) # Note here the length 5 for the 5 columns of df.
df <- data.frame(x = rpois(40,2), y = rpois(40,2), z = rpois(40,2), q = rpois(40,2), t = rpois(40,2))

Order columns by year independently in a dataframe in R

Data:
set.seed(0)
Temp <- data.frame(year=rep(1:3,each=4),V1=floor(rnorm(12)*2),V2=floor(rnorm(12)*2))
year V1 V2
1 1 2 -3
2 1 -1 -1
3 1 2 -1
4 1 2 -1
5 2 0 0
6 2 -4 -2
7 2 -2 0
8 2 -1 -3
9 3 -1 -1
10 3 4 0
11 3 1 0
12 3 -2 1
I want to reorder V1 and V2 independently within each year. I can do it with 10 lines, but I believe there must be a more beautiful way to do it.
Desired output:
year V1 V2
1 1 -1 -3
2 1 2 -1
3 1 2 -1
4 1 2 -1
5 2 -4 -3
6 2 -2 -2
7 2 -1 0
8 2 0 0
9 3 -2 -1
10 3 -1 0
11 3 1 0
12 3 4 1
Using dplyr you can do
library(dplyr)
Temp %>%
group_by(year) %>%
mutate(V1=sort(V1), V2=sort(V2))
which returns
# A tibble: 12 x 3
# Groups: year [3]
year V1 V2
<int> <dbl> <dbl>
1 1 -1 -3
2 1 2 -1
3 1 2 -1
4 1 2 -1
5 2 -4 -3
6 2 -2 -2
7 2 -1 0
8 2 0 0
9 3 -2 -1
10 3 -1 0
11 3 1 0
12 3 4 1
And if you needed to do that with all columns, you could do
Temp %>%
group_by(year) %>%
mutate_all(sort)
Using data.table:
library(data.table)
setDT(Temp)[,c("V1","V2"):=list(sort(V1),sort(V2)),year]
If you use plyr and you know the column names, you can easily do this using ddply:
library(plyr)
ddply(Temp, "year", summarize, V1=sort(V1), V2=sort(V2))
year V1 V2
1 1 -1 -3
2 1 2 -1
3 1 2 -1
4 1 2 -1
5 2 -4 -3
6 2 -2 -2
7 2 -1 0
8 2 0 0
9 3 -2 -1
10 3 -1 0
11 3 1 0
12 3 4 1
If you don't know the column names, you'd have to make a function to do it:
> ddply(Temp, "year", function(x) { as.data.frame(lapply(x, sort)) })
year V1 V2
1 1 -1 -3
2 1 2 -1
3 1 2 -1
4 1 2 -1
5 2 -4 -3
6 2 -2 -2
7 2 -1 0
8 2 0 0
9 3 -2 -1
10 3 -1 0
11 3 1 0
12 3 4 1

building matrix out of a vector with the difference of each value

dataset2 <- data.frame(bird=c("A","B","C","D","E","F"), rank=c(1:6))
I have this example dataset and now i want to build a 6*6 matrix with the rank difference between each bird. How can i do this?
Is this what you want?
m <- with(dataset2, outer(rank, rank, '-'))
rownames(m) <- colnames(m) <- dataset2$bird
# A B C D E F
# A 0 -1 -2 -3 -4 -5
# B 1 0 -1 -2 -3 -4
# C 2 1 0 -1 -2 -3
# D 3 2 1 0 -1 -2
# E 4 3 2 1 0 -1
# F 5 4 3 2 1 0
You might also want to do this afterwards:
m[upper.tri(m)] <- 0
tail(m[,-ncol(m)],-1)
To get:
# A B C D E
#B 1 0 0 0 0
#C 2 1 0 0 0
#D 3 2 1 0 0
#E 4 3 2 1 0
#F 5 4 3 2 1
This is kind of the definition of the distance matrix, no?
dist(dataset2, method="maximum")
####
1
2 1
3 2 1
4 3 2 1
5 4 3 2 1
With the distinction that it returns positive distance only... maybe it doesn't suits the OP..

Replace a column data with another column of data in a data frame while replacing prior instances <0 by 0

I have a data frame
x<-c(1,3,0,2,4,5,0,-2,-5,1,0)
y<-c(-1,-2,0,3,4,5,1,8,1,0,2)
data.frame(x,y)
x y
1 1 -1
2 3 -2
3 0 0
4 2 3
5 4 4
6 5 5
7 0 1
8 -2 8
9 -5 1
10 1 0
11 0 2
I would like to replace the data in column y with data from column x and also replacing in y the instances that where <0 in y and replacing them by 0. This will result in the following data frame
data.frame(x,y)
x y
1 1 0
2 3 0
3 0 0
4 2 2
5 4 4
6 5 5
7 0 0
8 -2 -2
9 -5 -5
10 1 0
11 0 0
Thanks
x<-c(1,3,0,2,4,5,0,-2,-5,1,0)
y<-c(-1,-2,0,3,4,5,1,8,1,0,2)
df <- data.frame(x, y)
df$y <- ifelse(y<0,0,x)
df
# x y
# 1 1 0
# 2 3 0
# 3 0 0
# 4 2 2
# 5 4 4
# 6 5 5
# 7 0 0
# 8 -2 -2
# 9 -5 -5
# 10 1 1
# 11 0 0
In one line:
> df <- transform(data.frame(x,y), y = ifelse(y<0,0,x))
> df
x y
1 1 0
2 3 0
3 0 0
4 2 2
5 4 4
6 5 5
7 0 0
8 -2 -2
9 -5 -5
10 1 1
11 0 0
Note that the resulting data differs from the reference result you provide on record 10. I suspect that this might be because you applied the condition <= 0 rather than < 0? Otherwise the 1 would be carried across from the x field for this record.
Given your x and y vectors, create the data.frame in one swift move:
> data.frame(x, y=ifelse(y < 0, 0, x))
x y
1 1 0
2 3 0
3 0 0
4 2 2
5 4 4
6 5 5
7 0 0
8 -2 -2
9 -5 -5
10 1 1
11 0 0

Resources