Compare columns from two data sets - r

I have two data sets
1 10
2 15
3 17
4 5
The second
1 to
2 b
4 c
I need to compare only one column of the two sets and at the end have what values are equal to the two sets in that column and exclude the rest, in summary when comparing the two columns I would have the following result
1 10
2 15
4 5
I don't know where to start, if someone can help me get started

We can use subset from base R (if the first column name in both datasets are 'col1')
subset(df1, col1 %in% df2$col1)

Related

if i want to sort a column by size in rstudio, how do i make sure that the associated values of the rows sort with the column?

I have a data.frame with 1200 rows and 5 columns, where each row contains 5 values of one person. now i need to sort one column by size but I want the remaining columns to sort with the column, so that one column is sorted by increasing values and the other columns contain the values of the right persons. ( So that one row still contains data from one and the same person)
colnames(BAPlotDET) = c("fsskiddet", "fspiddet","avg", "diff","absdiff")
these are the column names of my data.frame and I wanna sort it by the column called "avg"
First of all, please always provide us with a reproducible example such as below. The sorting of a data frame by default sorts all columns.
vector <- 1:3
BAPlotDET <- data.frame(vector, vector, vector, vector, vector)
colnames(BAPlotDET) = c("fsskiddet", "fspiddet","avg", "diff","absdiff")
fsskiddet fspiddet avg diff absdiff
1 1 1 1 1 1
2 2 2 2 2 2
3 3 3 3 3 3
BAPlotDET <- BAPlotDET[order(-BAPlotDET$avg),]
> BAPlotDET
fsskiddet fspiddet avg diff absdiff
3 3 3 3 3 3
2 2 2 2 2 2
1 1 1 1 1 1

Count certain words in dataframe/matrix in R [duplicate]

This question already has answers here:
Extracting a series of integers using a loop
(4 answers)
Closed 5 years ago.
I have this data , and what i want to do is to count the occurences(frequencies) of ONE, TWO, THREE in each columns
ex. 2 ONEs in the A column, 2 TWOs in the B column, 1 ONE in the C column etc
What function can i use to count certain words in R?
And how can i make a histogram out of this counts?
ABC <-read.csv("c:/Data/dataset.csv")
A B C
1 TWO ONE THREE
2 ONE ONE TWO
3 THREE TWO THREE
4 ONE TWO ONE
5 TWO THREE TWO
We can use mtabulate to get the count of unique elements in the dataset by each column
library(qdapTools)
t(mtabulate(ABC))
# A B C
#ONE 2 2 1
#THREE 1 1 2
#TWO 2 2 2
Or we use table, after unlisting the dataset and replicating the names of 'ABC'. Note that here we are calling the table only once.
tbl <- table(unlist(ABC),names(ABC)[col(ABC)])
tbl
# A B C
# ONE 2 2 1
# THREE 1 1 2
# TWO 2 2 2
A slightly faster option would be to use vapply with tabulate
vapply(ABC, function(x) tabulate(factor(x)), numeric(3))
If we need a barplot
barplot(tbl, beside=TRUE, legend=TRUE)
df <- data.frame(A=c('TWO','ONE','THREE','ONE','TWO'),B=c('ONE','ONE','TWO','TWO','THREE'),C=c('THREE','TWO','THREE','ONE','TWO'),stringsAsFactors=F);
sapply(df,table);
## A B C
## ONE 2 2 1
## THREE 1 1 2
## TWO 2 2 2

working with data in tables in R

I'm a newbie at working with R. I've got some data with multiple observations (i.e., rows) per subject. Each subject has a unique identifier (ID) and has another variable of interest (X) which is constant across each observation. The number of observations per subject differs.
The data might look like this:
ID Observation X
1 1 3
1 2 3
1 3 3
1 4 3
2 1 4
2 2 4
3 1 8
3 2 8
3 3 8
I'd like to find some code that would:
a) Identify the number of observations per subject
b) Identify subjects with greater than a certain number of observations (e.g., >= 15 observations)
c) For subjects with greater than a certain number of observations, I'd like to to manipulate the X value for each observation (e.g., I might want to subtract 1 from their X value, so I'd like to modify X for each observation to be X-1)
I might want to identify subjects with at least three observations and reduce their X value by 1. In the above, individuals #1 and #3 (ID) have at least three observations, and their X values--which are constant across all observations--are 3 and 8, respectively. I want to find code that would identify individuals #1 and #3 and then let me recode all of their X values into a different variable. Maybe I just want to subtract 1 from each X value. In that case, the code would then give me X values of (3-1=)2 for #1 and 7 for #3, but #2 would remain at X = 4.
Any suggestions appreciated, thanks!
You can use the aggregate function to do this.
a) Say your table is named temp, you can find the total number of observations for each ID and x column by using the SUM function in aggregate:
tot =aggregate(Observation~ID+x, temp,FUN = sum)
The output will look like this:
ID x Observation
1 1 3 10
2 2 4 3
3 3 8 6
b) To see the IDs that are over a certain number, you can create a subset of the table, tot.
vals = tot$ID[tot$Observation>5]
Output is:
[1] 1 3
c) To change the values that were found in (b) you reference the subsetted data, where the number of observations is > 5, and then update those values.
tot$x[vals] = tot$x[vals]+1
The final output for the table is
ID x Observation
1 1 4 10
2 2 4 3
3 3 9 6
To change the original table, you can subset the table by the IDs you found
temp[temp$ID %in% vals,]$x = temp[temp$ID %in% vals,]$x + 1
a) Identify the number of observations per subject
you can use this code on each variable:
summary

group and label rows in data frame by numeric in R

I need to group and label every x observations(rows) in a dataset in R.
I need to know if the last group of rows in the dataset has less than x observations
For example:
If I use a dataset with 10 observations and 2 variables and I want to group by every 3 rows.
I want to add a new column so that the dataset looks like this:
speed dist newcol
4 2 1
4 10 1
7 4 1
7 22 2
8 16 2
9 10 2
10 18 3
10 26 3
10 34 3
11 17 4
df$group <- rep(1:(nrow(df)/3), each = 3)
This works if the number of rows is an exact multiple of 3. Every three rows will get tagged in serial numbers.
A quick dirty way to tackle the problem of not knowing how incomplete the final group is to simply check the remained when nrow is modulus divided by group size: nrow(df) %% 3 #change the divisor to your group size
assuming your data is df you can do
df$newcol = rep(1:ceiling(nrow(df)/3), each = 3)[1:nrow(df)]

Sum of cells with same row and column name in R

I have a matrix created using table() command in R in which rows and columns do not have same values.
0 1 2
1 1 2 3
2 4 5 6
3 7 7 8
How can I sum the elements with the same row and column name? In this example it is equal to (2+6=)8.
Here's one approach:
# find the values present in both row names and column names
is <- do.call(intersect, unname(dimnames(x)))
# calculate the sum
sum(x[cbind(is, is)])
where x is your table.
Another one, self-explanatory:
sum(x[colnames(x)[col(x)] == rownames(x)[row(x)]])

Resources