How to display multiple columns data into single column - r

My data is in the following form:
Parameter Value Parameter Value Parameter Value
Speed 100 Time 1 Distance 260
and I want to display it in tabular format as all the 'Parameters' in one column and all the 'Values' in another column
Parameter Value
Speed 100
Time 1
Distance 260
Please help me with this.
Thanks in advance..!!

Here is a quick and dirty solution. I'm assuming the number of columns is even.
library(tidyverse)
library(magrittr)
library(janitor) # For making column names unique.
# Create your example dataset.
test = c('Speed', 100, 'Time', 1, 'Distance', 260) %>%
t() %>%
as.tibble() %>%
clean_names() # Make column names unique. tidyverse functions won't work otherwise.
# If you're reading your dataset into R via read_csv(), read_excel(), etc, be sure to
# run the imported tibble through clean_names().
# Create empty list to house each parameter and its value in each element.
params = list()
# Loop through the 'test' tibble, grabbing each parameter-value pair
# and throwing them in their own element of the list 'params'
for (i in 1:(ncol(test)/2)) {
# The 1st & 2nd columns are a parameter-value pair. Grab them.
params[[i]] = select(test, 1:2)
# Drop the 1st and second columns.
test = select(test, -(1:2))
}
# We want to combine the tibbles in 'params' row-wise into one big tibble.
# First, the column names must be the same for each tibble.
params = lapply(X = params, FUN = setNames, c('v1', 'v2'))
# Combine the tibbles row-wise into one big tibble.
test2 = do.call(rbind, params) %>%
set_colnames(c('Parameter', 'Value'))
# End. 'test2' is the desired output.

#Namrata here is an approach that uses base R functions, and doesn't require cleaning of the column names.
rawData <- "Parameter Value Parameter Value Parameter Value
Speed 100 Time 1 Distance 260"
# read the data and convert to a matrix of character values
# read the first row as data, not variable names
theData <- as.matrix(read.table(textConnection(rawData),header=FALSE,
stringsAsFactors=FALSE))
# transpose so required data is in second column
transposedData <- t(theData)
# calculate indexes to extract parameter names (odd) and values (even) rows
# from column 2
parmIndex <- seq(from=1,to=nrow(transposedData)-1,by=2)
valueIndex <- seq(from=2,to=nrow(transposedData),by=2)
# create 2 vectors
parameter <- transposedData[parmIndex,2]
value <- transposedData[valueIndex,2]
# convert to data frame and reset rownames
resultData <- data.frame(parameter,value=as.numeric(value),stringsAsFactors=FALSE)
rownames(resultData) <- 1:nrow(resultData)
The resulting output is:
regards,
Len

Related

Why does nrow() return a NULL value?

In writing code for a function, I have selected complete cases from the 2nd column of a data frame with 4 columns called "myData" and confirmed that 117 of >1700 rows have been selected into "mycases" by printing those values. The selection code is:
mycases <- myData[complete.cases(myData[,2]),2]
I can sum the values of these 117 cases successfully, but when I try to count them using code:
fkount <- nrow(mycases)
R returns NULL. What I am doing wrong? Is there some easier way to get the number of cases?
mycases is in your case a vector. If you want to know its length use length(mycases).
I guess you want something like this.
library(dplyr)
myData <- data.frame(A = c(1:3, NA), B = c(1,NA,2,NA))
myData %>% filter(complete.cases(.)) %>% nrow()
When you extract a single column from your data frame (or from a matrix), it is by default converted into a vector, and nrow does not work on vectors (since they don't have rows + columns).
You have (at least) 2 options:
use length() instead. This will work, but has the risk that if you use the same code later to extract 2 (or more) columns, it will now give a probably-undesired result: either the total length of an extracted matrix (all the elements), or the number of columns of an extracted data frame.
use the drop=FALSE argument of [ ]. This will prevent conversion of a single column into a vector, and it will remain a 2d object (but with ncol equal to 1). Then nrow will work as you intend.
Example:
mydata=data.frame(matrix(1:100,ncol=5))
# using length()
length( mydata[,2] )
# 20
# but watch out!
length( mydata[,2:3] )
# 2
# using drop=FALSE
nrow( mydata[,2,drop=FALSE] )
# 20
# safer:
nrow( mydata[,2:3,drop=FALSE] )
# 20

Vector gets stored as a dataframe instead of being a vector

I am new to r and rstudio and I need to create a vector that stores the first 100 rows of the csv file the programme reads . However , despite all my attempts my variable v1 ends up becoming a dataframe instead of an int vector . May I know what I can do to solve this? Here's my code:
library(readr)
library(readr)
cup_data <- read_csv("C:/Users/Asus.DESKTOP-BTB81TA/Desktop/STUDY/YEAR 2/
YEAR 2 SEM 2/PREDICTIVE ANALYTICS(1_PA_011763)/Week 1 (Intro to PA)/
Practical/cup98lrn variable subset small.csv")
# Retrieve only the selected columns
cup_data_small <- cup_data[c("AGE", "RAMNTALL", "NGIFTALL", "LASTGIFT",
"GENDER", "TIMELAG", "AVGGIFT", "TARGET_B", "TARGET_D")]
str(cup_data_small)
cup_data_small
#get the number of columns and rows
ncol(cup_data_small)
nrow(cup_data_small)
cat("No of column",ncol(cup_data_small),"\nNo of Row :",nrow(cup_data_small))
#cat
#Concatenate and print
#Outputs the objects, concatenating the representations.
#cat performs much less conversion than print.
#Print the first 10 rows of cup_data_small
head(cup_data_small, n=10)
#Create a vector V1 by selecting first 100 rows of AGE
v1 <- cup_data_small[1:100,"AGE",]
Here's what my environment says:
cup_data_small is a tibble, a slightly modified version of a dataframe that has slightly different rules to try to avoid some common quirks/inconsistencies in standard dataframes. E.g. in a standard dataframe, df[, c("a")] gives you a vector, and df[, c("a", "b")] gives you a dataframe - you're using the same syntax so arguably they should give the same type of result.
To get just a vector from a tibble, you have to explicitly pass drop = TRUE, e.g.:
library(dplyr)
# Standard dataframe
iris[, "Species"]
iris_tibble = iris %>%
as_tibble()
# Remains a tibble/dataframe
iris_tibble[, "Species"]
# This gives you just the vector
iris_tibble[, "Species", drop = TRUE]

Manipulating a dataset by separating variables

I have a data set that looks similar to the image shown below. Total, it is over a 1000 observations long. I want to create a new data frame that separates the single variable into 3 variables. Each variable is separated by a "+" in each observation, so it will need to be separated by using that as a factor.
Here is a solution using data.table:
library(data.table)
# Data frame
df <- data.frame(MovieId.Title.Genres = c("yyyy+xxxx+wwww", "zzzz+aaaa+aaaa"))
# Data frame to data table.
df <- data.table(df)
# Split column into parts.
df[, c("MovieId", "Title", "Genres") := tstrsplit(MovieId.Title.Genres, "\\+")]
# Print data table
df
I'll assume that your movieData object is a single column data.frame object.
If you want to split a single element from your data set, use strsplit using the character + (which R wants to see written as "\\+"):
# split the first element of movieData into a vector of strings:
strsplit(as.character(movieData[1,1]), "\\+")
Use lapply to apply this to the entire column, then massage the resulting list into a nice, usable data.frame:
# convert to a list of vectors:
step1 = lapply(movieData[,1], function(x) strsplit(as.character(x), "\\+"))
# step1 is a list, so make it into a data.frame:
step2 = as.data.frame(step1)
# step2 is a nice data.frame, but its names are garbage. Fix it:
movieDataWithColumns = setNames(step2, c("MovieId", "Title", "Genres"))

How to define 1000 dataframes in a single one?

My problem is the following. Suppose I have 1000 dataframes in R with the names eq1.1, eq1.2, ..., eq1.1000. I would like a single dataframe containing my 1000 dataframes. Normally, if I have only two dataframes, say eq1.1 and eq1.2 then I could define
df <- data.frame(eq1.1,eq1.2)
and I'm good. However, I can't follow this procedure because I have 1000 dataframes.
I was able to define a list containing the names of my 1000 dataframes using the code
names <- c()
for (i in 1:1000){names[i]<- paste0("eq1.",i)}
However, the elements of my list are recognized as strings and not as the dataframes that I previously defined.
Any help is appreciated!
How about
df.names <- ls(pattern = "^eq1\\.\\d")
eq1.dat <- do.call(cbind,
lapply(df.names,
get))
rm(list = df.names)
library(stringi)
library(dplyr)
# recreate dummy data
lapply(1:1000,function(i){
assign(sprintf("eq1.%s",i),
as.data.frame(matrix(ncol = 12, nrow = 13, sample(1:15))),
envir = .GlobalEnv)
})
# Now have 1000 data frames in my working environment named eq1.[1:1000]
> str(ls(pattern = "eq1.\\d+"))
> chr [1:1000] "eq1.1" "eq1.10" "eq1.100" "eq1.1000" "eq1.101" "eq1.102" "eq1.103" ...
1) create a holding data frame from the ep1.1 data frame that will be appended
each iteration in the following loop
empty_df <- eq1.1
2) im going to search for all the data frame named by convention and
create a data frame from the returned characters which represent our data frame
objects, but are nothing more than a character string.
3) mutate that data frame to hold an indexing column so that I can order the data frames properly from 1:1000 as the character representation will not be in numeric order from the step above
4) Drop the indexing column once the data frame names are in proper sequence
and then unlist the dfs column back into a character sequence and slice
the first value out, since it is stored already to our empty_df
5) loop through that sequence and for each iteration globally assign and
bind the preceding data frame into place. So for example on iteration 1,
the empty_df is now the same as data.frame(ep1.1, ep1.2) and for the
second iteration the empty_df is the same as data.frame(ep1.1, ep1.2, ep1.3)
NOTE: the get function takes the character representation and calls the data object from it. see ?get for details
lapply(
data.frame(dfs = ls(pattern = 'eq1\\.\\d+'))%>%
mutate(nth = as.numeric(stri_extract_last_regex(dfs,'\\d+'))) %>%
arrange(nth) %>% select(-nth) %>% slice(-1) %>% .$dfs, function(i){
empty_df <<- data.frame(empty_df, get(i))
}
)
All done, all the dataframes are bound to the empty_df and to check
> dim(empty_df)
[1] 13 12000

Select a column from multiple dataframes in a list

My list has multiple data frames with only two columns
DateTime Value
30-06-2016 100
31-07-2016 200
.
.
.
I just want to extract the column Value from the list. The fillowing code proved unsuccesful for me. What am I doing wrong here ?
actual_data <- lapply(test_data, function(df) df[,is.numeric(df)])
> actual_data[[1]]
data frame with 0 columns and 12 rows
Thank you
purrr::map (an enhanced version of lapply) provides a shortcut for this type of operation:
# Generate test data
set.seed(35156)
test_df <- data.frame('DateTime' = rnorm(100), 'Value' = rnorm(100))
test_data <- rep(list(test_df), 100)
# Use `map` from the purrr package to subset the data.frames
purrr::map(test_data, 'Value')
purrr::map(test_data, 2)
As you can see in the example above, you can select columns in a data.frame either by name, by passing a character string as the second argument to purrr::map, or by position, by passing a number.

Resources