I have a table with column names
ID, X1, X2, X3
and contains column values like
1, Hi, Hello,NULL
2, NULL,Hello123,XXX
the output should be
1 X1
1 X2
2 X2
2 X3
The null values needs to be filtered out and the column names should come as column values.
You have to combine two things:
1) TITLE -> this function will put the Title / ColumnName into the ColumnValue
2) UNPIVOT -> This Function will place the columns aside into rows among each other
Related
Please I am new to R and here as well. I am unable to upload an image of my dataset at the moment.
Here is my problem:
I have a dataset containing two columns that are of particular interest to me. One of them Status contains identifiers (1 and 2) 1 represents the variable Y1 and 2 represents the variable Y2. I need to run two separate regressions using Y1 and Y2 as dependent variables.
The other column Y1andY2 contains the respective value of Y1 and Y2 all merged into a single column. So I need a way of separating or grouping those values into Y1 and Y2. This would allow me to run the two separate regressions.
Status Y1andY2
1 1.521174
2 1.873917
2 2.116277
1 1.803262
1 3.725778
2 2.285313
1 2.732088
1 2.799842
2 2.976210
1 1.337500
1 1.259238
Your help would be greatly appreciate.
Thanks
Cheers
Ludov
I think you want to convert your data to wide format so that instead of having a Status ('key') column and a Y1andY2 ('values') column you want one column with the Y1 values and one with the Y2 values. This will make your df have half as many rows.
library(dplyr)
df %>% pivot_wider(names_from = 'Status', values_from = 'Y1andY2')
I want to count unique values of categorical variable based of a column based on Date.
I want result as a matrix where column names are the values categorical variable, row names will be unique Date values and their cell values is the unique count.
The below links solves the group by problem but I am looking for the transformed df:
How to add count of unique values by group to R data.frame
R: Extract unique values in one column grouped by values in another column
My df has more than 50,000 rows and looks like:
dat <- data.frame(Date = c('06/08/2018','06/08/2018','07/08/2018','07/08/2018','08/08/2018','09/08/2018','09/08/2018','11/08/2018','11/08/2018','13/08/2018'),
Type= c('A','B','C','A','B','A','A','B','C','C'))
I want my resultant matrix to have "A", "B" ,"C" as new columns, "Date" as the rows and values in matrix as the unique count, shown in below image:
Also, it would be great that we don't hardcode categorical values. So, in future if instead of 3 it becomes 4, then code automatically handles it.
How about using table...
mat <- table(dat$Date, dat$Type)
mat
A B C
06/08/2018 1 1 0
07/08/2018 1 0 1
08/08/2018 0 1 0
09/08/2018 2 0 0
11/08/2018 0 1 1
13/08/2018 0 0 1
What you're looking for is dcast():
dcast(dat, Date ~ Type, fun.aggregate = length, value.var = "Type")
This function will quickly aggregate your data based upon the fun.aggregate argument (in your case length().
This uses spread
library(tidyverse)
spread_data <- (data, key = type, value = 2)
I have a dataset in which I wish to sum each value in column n, with its corresponding value in column (n+(ncol/2)); i.e., so I can sum a value in column 1 row 1 with a value in column 12 row 1, for a dataset with 22 columns, and repeat this until column 11 is summed with column 22. The solution needs to work for hundreds of rows.
How do I do this using R, while ignoring the column names?
Suppose your data is
d <- setNames(as.data.frame(matrix(rnorm(100 * 22), nc = 22)), LETTERS[1:22])
You can do a simple matrix addition using numbers to select the columns:
output <- d[, 1:11] + d[, 12:22]
so, e.g.
all.equal(output[,1], d[,1] + d[,12])
# [1] TRUE
I have a data frame with many rows and columns in it (3000x37) and I want to be able to select only rows that may have >= 2 columns of value "NA". These columns have data of different data types. I know how to do this in case I want to select only one column via:
df[is.na(df$col.name), ]
How to make this selection if I want to select two (or more) columns?
First create a vector nn with the of the number of NA's in each row and then select only those rows with >= 2 NA's d[nn>=2,]
d = data.frame(x=c(NA,1,2,3), y=c(NA,"a",NA,"c"))
nn = apply(d, 1, FUN=function (x) {sum(is.na(x))})
d[nn>=2,]
x y
1 NA <NA>
I have a 9801 by 3 reference table.
The first 2 columns of this table is defined as follows.
x1 = x2 = seq(0.01,0.99,0.01)
x12 = data.matrix(expand.grid(x1,x2))
The 3rd columns contains the outcome values.
Now I have another n by 3 matrix where the 1st and 2nd columns are selected rows of the above matrix 'x12' and the 3rd column is to be filled. I would like fill in the 3rd column of the 2nd table by looking up the same combination of the 1st and 2nd column in the 1st table and find the value in the 3rd column.
How can I do this?
You can do this with the merge function:
# Original data frame
x1 = x2 = seq(0.01,0.99,0.01)
x12 = expand.grid(x1,x2)
# Add a fake "outcome"
x12$outcome = rnorm(nrow(x12))
# New data frame with 100 random rows and the first two columns of x12
x12new = x12[sample(1:nrow(x12), 100), c(1,2)]
# Merge the outcome values from x12 into x12new
x12new = merge(x12new, x12, by=c("Var1","Var2"), all.x=TRUE)
by tells merge which columns must match when comparing the two data frames. all.x=TRUE tells merge to keep all rows from the first data frame, x12new in this case, even if they don't have a match in the second data frame (not an issue here, but you'll often want to make sure you don't lose any rows when merging).
One other thing to note is that, unlike vlookup in Excel, merge will increase the number of rows in the new, merged data frame if there are multiple rows that match the criteria. For example, see what happens when you merge values from df2 into df1:
df1 = data.frame(x = c(1,2,3,4), z=c(10,20,30,40))
df2 = data.frame(x = c(1,1,1,2,3), y=c("a","b","c","a","c"))
merge(df1, df2, by="x", all.x=TRUE)
x z y
1 1 10 a
2 1 10 b
3 1 10 c
4 2 20 a
5 3 30 c
6 4 40 <NA>
You can also use left_join from the dplyr package (other types of joins are available as well):
library(dplyr)
left_join(df1, df2, by="x")