I am trying to read in the following .csv file into R. As you can see from the imagine below, Row 2 has the unique variable names, while Row 3 has the values for the above variables. So Rows 2/3 together represent one observation. This process continues, so that Row 4 is a line of variable names and Row 5 corresponds to those variable values. This process continues so that each two-row pair (2/3, 4/5, 6/7....999/1000) represent one observation. There are 1,000 observations total in the data set.
What I am having trouble with is reading this into R so that I have a more usable dataset. My goal is to have a standard set of variable names across the top row, and each subsequent line representing one observation.
Any expert R coders have suggestions?
Thank you,
Here is a solution that worked on a simple test case I made. You'd need to import your data into a data.frame, x = read.csv(file="your-file.csv")
To test this though, I used the test data.frame, x:
x=structure(list(V1 = structure(c(2L, 1L, 4L, 3L), .Label = c("1",
"a", "ab", "h"), class = "factor"), V2 = structure(c(2L, 1L,
4L, 3L), .Label = c("2", "b", "cd", "i"), class = "factor"),
V3 = structure(c(3L, 1L, 2L, 4L), .Label = c("3", "a", "c",
"ef"), class = "factor"), V4 = structure(c(3L, 1L, 2L, 4L
), .Label = c("4", "b", "d", "gh"), class = "factor"), V5 = structure(c(3L,
1L, 2L, 4L), .Label = c("5", "c", "e", "ij"), class = "factor"),
V6 = structure(c(3L, 1L, 2L, 4L), .Label = c("6", "d", "f",
"kl"), class = "factor"), V7 = structure(c(3L, 1L, 2L, 4L
), .Label = c("7", "e", "g", "mno"), class = "factor")), .Names = c("V1",
"V2", "V3", "V4", "V5", "V6", "V7"), class = "data.frame", row.names = c(NA,
-4L))
Which turns this table (rows 1 and 3 are your labels):
V1 V2 V3 V4 V5 V6 V7
1 a b c d e f g
2 1 2 3 4 5 6 7
3 h i a b c d e
4 ab cd ef gh ij kl mno
Using this script to generate a final data.frame dat:
library(plyr)
variables = x[seq(1,nrow(x),2),] #df of all variable rows
values = x[seq(2,nrow(x),2),] #df of all value rows
dat=data.frame() #generate blank data.frame
for(i in 1:nrow(variables)) {
dat.temp=data.frame(values[i,])#make temporary df for the row i of your values
colnames(dat.temp)=as.matrix(variables[i,]) # name the temporary df from row i of your variables
print(dat.temp) #check that they are coming out right (comment this out as necessary)
dat=rbind.fill(dat,dat.temp) #create the final data.frame
rm(dat.temp) #remove the temporary df
}
Into this final table (variables are the column names now):
a b c d e f g h i
1 1 2 3 4 5 6 7 <NA> <NA>
2 ef gh ij kl mno <NA> <NA> ab cd
Hope it works.
Related
I would like to join repeatedly between two tables. Here is the table.
structure(list(key = structure(1:4, .Label = c("A", "B", "C", "D"),
class = "factor")), class = "data.frame", row.names = c(NA,
-4L))
structure(list(key = structure(c(1L, 2L, 2L, 3L), .Label = c("A",
"B", "C"), class = "factor"), source = structure(c(1L, 1L, 2L, 2L), .Label = c("a", "b"), class = "factor"), value = c(1L, 1L, 2L, 2L)), class = "data.frame", row.names = c(NA, -4L))
<joined>
key
A
B
C
D
<joining>
key source value
A a 1
B a 1
B b 2
C b 2
If I use left_join function like left_join(joined, joining, by = "key"), the results is here.
key source value
1 A a 1
2 B a 1
3 B b 2
4 C b 2
5 D <NA> NA
However, I want to join grouping by "source". My expected results are here.
joining_a <- joining %>%
filter(source == "a")
joining_b <- joining %>%
filter(source == "b")
left_join(joined, joining_a, by = "key")
left_join(joined, joining_b, by = "key")
bind_rows(left_join(joined, joining_a, by = "key"), left_join(joined, joining_b, by = "key"))
key source value
1 A a 1
2 B a 1
3 C <NA> NA
4 D <NA> NA
5 A <NA> NA
6 B b 2
7 C b 2
8 D <NA> NA
How do I join the tables not dividing these tables?
We can group_split(or split from base R) the 'joining' into a list and then do the left_join with 'joined' using map
library(tidyverse)
joining %>%
group_split(source) %>%
map_dfr(~ left_join(joined, .x, by = 'key'))
# key source value
#1 A a 1
#2 B a 1
#3 C <NA> NA
#4 D <NA> NA
#5 A <NA> NA
#6 B b 2
#7 C b 2
#8 D <NA> NA
Or without a lambda function
joining %>%
group_split(source) %>%
map_dfr(left_join, x = joined, by = 'key')
data
joined <- structure(list(key = structure(1:4, .Label = c("A", "B", "C",
"D"), class = "factor")), class = "data.frame", row.names = c(NA,
-4L))
joining <- structure(list(key = structure(c(1L, 2L, 2L, 3L),
.Label = c("A",
"B", "C"), class = "factor"), source = structure(c(1L, 1L, 2L,
2L), .Label = c("a", "b"), class = "factor"), value = c(1L, 1L,
2L, 2L)), class = "data.frame", row.names = c(NA, -4L))
This is my dataframe:
df<-list(structure(list(Col1 = structure(1:6, .Label = c("A", "B",
"C", "D", "E", "F"), class = "factor"), Col2 = structure(c(1L,
2L, 3L, 2L, 4L, 5L), .Label = c("B", "C", "D", "F", "G"), class = "factor")), class = "data.frame", row.names = c(NA,
-6L)), structure(list(Col1 = structure(c(1L, 4L, 5L, 6L, 2L,
3L), .Label = c("A", "E", "H", "M", "N", "P"), class = "factor"),
Col2 = structure(c(1L, 2L, 3L, 2L, 4L, 5L), .Label = c("B",
"C", "D", "F", "G"), class = "factor")), class = "data.frame", row.names = c(NA,
-6L)), structure(list(Col1 = structure(c(1L, 4L, 6L, 5L, 2L,
3L), .Label = c("A", "W", "H", "M", "T", "U"), class = "factor"),
Col2 = structure(c(1L, 2L, 3L, 2L, 4L, 5L), .Label = c("B",
"C", "D", "S", "G"), class = "factor")), class = "data.frame", row.names = c(NA,
-6L)))
I want to extract col1=df[[1]][1] as a dataframe. Then col1 of the second position of this list I want to merge to the df[[1]][1], then I will have a dataframe with 2 columns.
After this I want to merge the column 1 of the third position of the list to the dataframe with two columns, then I will have a dataframe with 3 columns.
In other words my dataframe should have 3 columns, all the first columns of each entry of my list.
The dplyr package can helpme to do this?
Any help?
You can use lapply to extract the three columns named "Col1 in one go. Then set the names of the result.
col1 <- as.data.frame(lapply(df, '[[', "Col1"))
names(col1) <- letters[seq_along(col1)]
col1
# a b c
#1 A A A
#2 B M M
#3 C N U
#4 D P T
#5 E E W
#6 F H H
Choose any other column names that you might find better.
A dplyr way could be
df %>%
unlist(recursive = FALSE) %>%
as.data.frame %>%
select(., starts_with("Col1"))
# Col1 Col1.1 Col1.2
#1 A A A
#2 B M M
#3 C N U
#4 D P T
#5 E E W
#6 F H H
With map_dfc from purrr:
library(purrr)
map_dfc(df, `[`, 1)
Output:
Col1 Col11 Col12
1 A A A
2 B M M
3 C N U
4 D P T
5 E E W
6 F H H
Alternative use of map_dfc making use of purrr's concise element extraction syntax that allows specifying elements of elements by name or position. The first is, for example, equivalent to
map_dfc(df, `[[`, 1)
which differs from the use of [ in that the columns will not be named variations of Col1 and just get V names instead, which may be desirable since names like Col11 and Col12 may be confusing.
df <- list(structure(list(Col1 = structure(1:6, .Label = c("A", "B", "C", "D", "E", "F"), class = "factor"), Col2 = structure(c(1L, 2L, 3L, 2L, 4L, 5L), .Label = c("B", "C", "D", "F", "G"), class = "factor")), class = "data.frame", row.names = c(NA, -6L)), structure(list(Col1 = structure(c(1L, 4L, 5L, 6L, 2L, 3L), .Label = c("A", "E", "H", "M", "N", "P"), class = "factor"), Col2 = structure(c(1L, 2L, 3L, 2L, 4L, 5L), .Label = c("B", "C", "D", "F", "G"), class = "factor")), class = "data.frame", row.names = c(NA, -6L)), structure(list(Col1 = structure(c(1L, 4L, 6L, 5L, 2L, 3L), .Label = c("A", "W", "H", "M", "T", "U"), class = "factor"), Col2 = structure(c(1L, 2L, 3L, 2L, 4L, 5L), .Label = c("B", "C", "D", "S", "G"), class = "factor")), class = "data.frame", row.names = c(NA, -6L)))
library(purrr)
map_dfc(df, 1)
#> # A tibble: 6 x 3
#> V1 V2 V3
#> <fct> <fct> <fct>
#> 1 A A A
#> 2 B M M
#> 3 C N U
#> 4 D P T
#> 5 E E W
#> 6 F H H
map_dfc(df, "Col1")
#> # A tibble: 6 x 3
#> V1 V2 V3
#> <fct> <fct> <fct>
#> 1 A A A
#> 2 B M M
#> 3 C N U
#> 4 D P T
#> 5 E E W
#> 6 F H H
Created on 2018-09-19 by the reprex package (v0.2.0).
res<-1:nrow(df[[1]][1])
for(i in 1:length(df)){
print ( as.vector(df[[i]][1]))
res<-cbind(res,as.data.frame(df[[i]][1]))
}
res$res<-NULL
So, the output is:
Col1 Col1 Col1
1 A A A
2 B M M
3 C N U
4 D P T
5 E E W
6 F H H
Using dplyr
library(dplyr)
df %>%
sapply('[[',1) %>%
as.data.frame
#returns
V1 V2 V3
1 A A A
2 B M M
3 C N U
4 D P T
5 E E W
6 F H H
I have five columns with 2 levels and their column names are like c(a,b,x,y,z). The command below works for 1 column at time. But I need to it for all five columns at the same time.
levels(car_data[,"x"]) <- c(0,1)
car_data[,"x"] <- as.numeric(levels(car_data[,"x"]))[car_data[,"x"]]
If there are two levels, then we can do
library(dplyr)
car_data %>%
mutate_all(funs(as.integer(.)-1))
# a b c
#1 0 0 0
#2 1 1 1
#3 0 0 0
#4 1 1 1
data
car_data <- structure(list(a = structure(c(1L, 2L, 1L, 2L), .Label = c("a",
"b"), class = "factor"), b = structure(c(1L, 2L, 1L, 2L), .Label = c("a",
"b"), class = "factor"), c = structure(c(1L, 2L, 1L, 2L), .Label = c("a",
"b"), class = "factor")), .Names = c("a", "b", "c"), row.names = c(NA,
-4L), class = "data.frame")
This question already has answers here:
How to produce a heatmap with ggplot2?
(2 answers)
Closed 7 years ago.
I would simply like to represent a sequence of categorical states with different colours.
This kind of plot is also known as individual sequence plot (TraMineR).
I would like to use ggplot2.
My data simply look like this
> head(dta)
V1 V2 V3 V4 V5 id
1 b a e d c 1
2 d b a e c 2
3 b c a e d 3
4 c b a e d 4
5 b c e a d 5
with the personal id in the last column.
The plot looks like this.
Each letters (states) is represented by a colour. Basically, this plot visualise the successive states for each individual.
Blue is a, Red is b, Purple is c, Yellow is d and Brown is e.
Any idea how I could do this with ggplot2?
dta = structure(list(V1 = structure(c(1L, 3L, 1L, 2L, 1L), .Label = c("b",
"c", "d"), class = "factor"), V2 = structure(c(1L, 2L, 3L, 2L,
3L), .Label = c("a", "b", "c"), class = "factor"), V3 = structure(c(2L,
1L, 1L, 1L, 2L), .Label = c("a", "e"), class = "factor"), V4 = structure(c(2L,
3L, 3L, 3L, 1L), .Label = c("a", "d", "e"), class = "factor"),
V5 = structure(c(1L, 1L, 2L, 2L, 2L), .Label = c("c", "d"
), class = "factor"), id = 1:5), .Names = c("V1", "V2", "V3",
"V4", "V5", "id"), row.names = c(NA, -5L), class = "data.frame")
what I tried so far
nr = nrow(dta3)
nc = ncol(dta3)
# space
m = 0.8
n = 1 # do not touch this one
plot(0, xlim = c(1,nc*n), ylim = c(1, nr), type = 'n', axes = F, ylab = 'individual sequences', xlab = 'Time')
axis(1, at = c(1:nc*m), labels = c(1:nc))
axis(2, at = c(1:nr), labels = c(1:nr) )
for(i in 1:nc){
points(x = rep(i*m,nr) , y = 1:nr, col = dta3[,i], pch = 15)
}
But it is not with ggplot2 and not very satisfying.
Here you go:
library(reshape2)
library(ggplot2)
m_dta <- melt(dta,id.var="id")
m_dta
p1 <- ggplot(m_dta,aes(x=variable,y=id,fill=value))+
geom_tile()
p1
I have a sales report table (DF1)and I need to replace only a few product codes by their associated group codes
Model SOLD
A 5
B 4
C 4
D 3
F 11
I have another table (DF2) where I have the Model# and the associated group codes
Model Group
A 1
B 1
C 2
D 2
I would like to replace the model# in DF1 by the group number if the model exist in DF2.
The wanted end result:
Model SOLD
1 5
1 4
2 4
2 3
F 11
Thank you!
You can do this with qdapTools's lookup family, specifically, the binary operator %lc+% (a wrapper for the data.table package). The l stands for lookup, the c forces te terms to character and the + only replaces those elements that are found in the lookup table:
library(qdap)
df1$Model <- df1$Model %lc+% df2
Here it is more explicitly:
df1 <- structure(list(Model = structure(1:5, .Label = c("A", "B", "C",
"D", "F"), class = "factor"), SOLD = c(5L, 4L, 4L, 3L, 11L)), .Names = c("Model",
"SOLD"), class = "data.frame", row.names = c(NA, -5L))
df2 <- structure(list(Model = structure(1:4, .Label = c("A", "B", "C",
"D"), class = "factor"), Group = c(1L, 1L, 2L, 2L)), .Names = c("Model",
"Group"), class = "data.frame", row.names = c(NA, -4L))
library(qdap)
df1$Model <- df1$Model %lc+% df2
df1
## Model SOLD
## 1 1 5
## 2 1 4
## 3 2 4
## 4 2 3
## 5 F 11