I was wondering about the following thing:
I have a 16x2 matrix with in the first column numerical values and in the second column also numerical values but actually they're position numbers so they need to be treated as a factor.
I want to order the values from the first column from low to high but I need the numbers of the second column to stay with their original partner value from the first column.
So let's say you've got:
4 1
6 2
2 3
And now I want to sort the first column from low to high.
Then I want to get
2 3
4 1
6 2
Does anybody know how I can do this?
R doesn't seem to provide a variable type for paired data...
You can do:
dat[order(dat[, 1]), ]
Related
I have two unevenly-spaced time series that each measure separate attributes of the same system. The two series's data points are not sampled at the same times, and the series are not the same length. I would like to match each row from series A to the row of B that is closest to it in time. What I have in mind is to add a column to A that contains indexes to the closest row in B. Both series have a time column measured in Unix time (eg. 1459719755).
for example, given two datasets
a time
2 1459719755
4 1459719772
3 1459719773
b time
45 1459719756
2 1459719763
13 1459719766
22 1459719774
The first dataset should be updated to
a time index
2 1459719755 1
4 1459719772 4
3 1459719773 4
since B[1,]$time has the closest value to A[1,]$time, B[4,]$time has the closest value to A[2,]$time and A[3,]$time.
Is there any convenient way to do this?
Try something like this:
(1+ecdf(bdat$time)(adat$time)*nrow(bdat))
[1] 1 4 4
Why should this work? The ecdf function returns another function that has a value from 0 to 1. It returns the "position" in the "probability range" [0,1] of a new value in a distribution of values defined by the first argument to ecdf. The expression is really just rescaling that function's result to the range [1, nrow(bdat)]. (I think it's flipping elegant.)
Another approach would be to use approxfun on the sorted values of bdat$time which would then let get you interpolated values. These might need to be rounded. Using them as indices would instead truncate to integer.
apf <- approxfun( x=sort(bdat$time), y=seq(length( bdat$time)) ,rule=2)
apf( adat$time)
#[1] 1.000 3.750 3.875
round( apf( adat$time))
#[1] 1 4 4
In both case you are predicting a sorted value from its "order statistic". In the second case you should check that ties are handled in the manner you desire.
I have a R dataframe like this one:
a<-c(1,2,3,4,5)
b<-c(6,7,8,9,10)
df<-data.frame(a,b)
colnames(df)<-c("a","b")
df
a b
1 1 6
2 2 7
3 3 8
4 4 9
5 5 10
I would like to get the 1st, 2nd, 3rd AND 5th row of the column a, so 1 2 3 5, by selecting rows by their number.
I have tried df$a[1:3,5] but I get Error in df$a[1:3, 5] : incorrect number of dimensions.
What DOES work is c(df$a[1:3],df$a[5]) but I was wondering if there was an easier way to achieve this with R?
Your data frame has two dimensions (rows and columns). When you use the square brackets to extract values, R expects everything prior to the comma to indicate the rows desired, and everything after the comma to indicate the columns desired (see: ?[). Hence, df[1:3,5] means rows 1 through 3, from column 5. To turn your desired rows into a single vector, you need to concatenate (i.e., c(1:3,5)). That would all go before the comma, the column indicator, 1 or "a", would go after the comma. Thus, df[c(1:3,5), 1] is what you need.
For alternative answer (that might be more appropriate to a dataframe with many more columns), df[c(1:3, 5), "a"] as suggested by #Mamoun Benghezal would also get it done!
I am using R to analyze a survey. Several of the columns include numbers 1-10, depending on how survey respondents answered the respective questions. I'd like to change the 1-10 scale to a 1-3 scale. Is there a simple way to do this? I was writing a complicated set of for loops and if statements, but I feel like there must be a better way in R.
I'd like to change numbers 1-3 to 1; numbers 4 and 8 to 2; numbers 5-7 to 3, and numbers 9 and 10 to NA.
So in the snippet below, OriginalColumn would become NewColumn.
OriginalColumn=c(4,9,1,10,8,3,2,7,5,6)
NewColumn=c(2,NA,1,NA,2,1,1,3,3,3)
Is there an easy way to do this without a bunch of crazy for loops? Thanks!
You can do this using positional indexing:
> c(1,1,1,2,3,3,3,2,NA,NA)[OriginalColumn]
[1] 2 NA 1 NA 2 1 1 3 3 3
It is better than repeated/nested ifelse because it is vectorized (thus easier to read, write, and understand; and probably faster). In essence, you're creating a new vector that contains that new values for every value you want to replace. So, for values 1:3 you want 1, thus the first three elements of the vector are 1, and so forth. You then use your original vector to extract the new values based on the positions of the original values.
You could also try
library(car)
recode(OriginalColumn, '1:3=1; c(4,8)=2; 5:7=3; else=NA')
#[1] 2 NA 1 NA 2 1 1 3 3 3
I am trying to merge a data.frame and a column from another data.frame, but have so far been unsuccessful.
My first data.frame [Frequencies] consists of 2 columns, containing 47 upper/ lower case alpha characters and their frequency in a bigger data set. For example purposes:
Character<-c("A","a","B","b")
Frequency<-(100,230,500,420)
The second data.frame [Sequences] is 93,000 rows in length and contains 2 columns, with the 47 same upper/ lower case alpha characters and a corresponding qualitative description. For example:
Character<-c("a","a","b","A")
Descriptor<-c("Fast","Fast","Slow","Stop")
I wish to add the descriptor column to the [Frequencies] data.frame, but not the 93,000 rows! Rather, what each "Character" represents. For example:
Character<-c("a")
Frequency<-c("230")
Descriptor<-c("Fast")
Following can also be done:
> merge(adf, bdf[!duplicated(bdf$Character),])
Character Frequency Descriptor
1 a 230 Fast
2 A 100 Fast
3 b 420 Stop
4 B 500 Slow
Why not:
df1$Descriptor <- df2$Descriptor[ match(df1$Character, df2$Character) ]
What I want to do is assign a value of 1 to the first 1/3 of observations ofmy data, then a value of 2 to the second 1/3 of observations of my data and finally a value of 3 to the third 1/3 of observations of my data.
Taking into a ccount that my data consists of 30 observations, I did the following code:
c1 <- c(rep(1,10),rep(2,10),rep(3,10))
which I cbinded to my data
gala2 <- cbind(data,c1)
Then, for the first 10 observations (my first 1/3), the value of c1 is 1, for the next ten observations (second 1/3) the value of c1 is 2 and for the last ten observations (my third 1/3) the value of c1 is 3.
This works perfectly fine, but I wanted to ask if there is a way to do this in a more "abstract" way. That is, to tell R to assign the value of 1 to the first 1/3 of the data, assign the value of 2 to the second 1/3 and the value of 3 to the third 1/3?
Best regards,
Yes there is, try to take a look to cut(). To illustrate a bit try this with your example:
cut(yourDataAsNumeric,3,labels=FALSE)
You can use
sort(rep_len(seq(3), length(c1)))
where c1 is your vector.