I am trying to plot a line chart using Date-time and no of tweets at that period of date and time in R.
library(ggplot2)
df1 <- structure(list(Date = structure(c(1L, 1L, 2L, 1L, 1L, 1L), .Label = c("2020-03-12",
"2020-03-13"), class = "factor"), Time = structure(c(1L, 1L, 2L,
3L, 4L, 5L), .Label = c("00:00:00Z", "00:00:01Z", "00:10:04Z",
"00:25:12Z", "01:00:02Z"), class = "factor"), Text = structure(c(5L,
3L, 6L, 4L, 2L, 1L), .Label = c("The images of demonstrations and gathering", "Premium policy get activate by company abc",
"Launches of rocket", "Premium policy get activate by company abc",
"Technology makes trend", "The images of demonstrations and gatherings",
"Weather forecasting by xyz"), class = "factor")), class = "data.frame", row.names = c(NA,
-6L))
ggplot(df1, aes(x = Date, y = text(count)) + geom_line(aes(color = variable), size = 1)
I tried the above code to plot desired result but got an error. Dataset given like that in csv format.
Date Time Text
2020-03-12 00:00:00Z The images of demonstrations and gatherings
2020-03-12 00:00:00Z Premium policy get activate by company abc
2020-03-12 00:00:01Z Weather forecasting by xyz
2020-03-12 00:10:04Z Technology makes trend
2020-03-12 00:25:12Z Launches of rocket
2020-03-12 01:00:02Z Government launch new policy to different sector improvement
I have a dataset of nearly 15 days and want to plot the line chart to visualize the number of tweets (given in text column) to see the trend of tweets on different time and date.
df1 <- structure(list(Date = structure(c(1L, 1L, 2L, 1L, 1L, 1L), .Label = c("3/12/2020",
"3/13/2020"), class = "factor"), Time = structure(c(1L, 1L, 2L,
3L, 4L, 5L), .Label = c("00:00:00Z", "00:00:01Z", "00:10:04Z",
"00:25:12Z", "01:00:02Z"), class = "factor"), Text = structure(c(5L,
3L, 6L, 4L, 2L, 1L), .Label = c("Government launch new policy to different sector",
"Launches of rocket", "Premium policy get activate by company abc",
"Technology makes trend", "The images of demonstrations and gatherings",
"Weather forecasting by xyz"), class = "factor"), X = structure(c(1L,
1L, 1L, 1L, 1L, 2L), .Label = c("", "improvement"), class = "factor")), class = "data.frame", row.names = c(NA,
-6L))
Creating the dataset df1 as above then running this gives you required plot for hour
library(tidyverse)
library(lubridate)
df1 %>%
mutate(Time=hms(Time),
Date=mdy(Date),
hour=hour(Time)) %>%
count(hour) %>%
ggplot(aes(hour,n,group=1))+geom_line()+geom_point()
Is this what you are after?
library(dplyr)
library(lubridate)
library(stringr)
library(ggplot2)
Answer with your data
To demonstrate data wrangling.
# your data;
df1 <- structure(list(Date = structure(c(1L, 1L, 2L, 1L, 1L, 1L),
.Label = c("2020-03-12","2020-03-13"),
class = "factor"),
Time = structure(c(1L, 1L, 2L,3L, 4L, 5L),
.Label = c("00:00:00Z", "00:00:01Z", "00:10:04Z","00:25:12Z", "01:00:02Z"),
class = "factor"),
Text = structure(c(5L,3L, 6L, 4L, 2L, 1L),
.Label = c("The images of demonstrations and gathering", "Premium policy get activate by company abc",
"Launches of rocket", "Premium policy get activate by company abc",
"Technology makes trend", "The images of demonstrations and gatherings", "Weather forecasting by xyz"), class = "factor")),
class = "data.frame", row.names = c(NA,-6L))
# data wrangle
df2 <-
df1 %>%
# change all variables from factors to character
mutate_all(as.character) %>%
mutate(Time = str_remove(Time, "Z$"), #remove the trailing 'Z' from Time values
dt = ymd_hms(paste(Date, Time, sep = " ")), # change text into datetime format using lubridtate::ymd_hms
dt = ceiling_date(dt, unit="hour")) %>% # round to the end of the named hour, separated for clarity
group_by(dt) %>%
summarise(nr_tweets = n())
# plot
p1 <- ggplot(df2, aes(dt, nr_tweets))+
geom_line()+
scale_x_datetime(date_breaks = "1 day", date_labels = "%d/%m")+
ggtitle("Data from question `df1`")
Answer with made up large dataset
tib <- tibble(dt = sample(seq(ISOdate(2020,05,01), ISOdate(2020,05,15), by = "sec"), 10000, replace = TRUE),
text = sample(c(letters[1:26], LETTERS[1:26]), 10000, replace = TRUE))
tib1 <-
tib %>%
mutate(dt = round_date(dt, unit="hour"))%>%
group_by(dt) %>%
summarise(nr_tweets = n())
p2 <- ggplot(tib1, aes(dt, nr_tweets))+
geom_line()+
scale_x_datetime(date_breaks = "1 day", date_labels = "%d/%m")+
ggtitle("Result using `tib` data made up to answer the question")
p1/p2
Created on 2020-05-13 by the reprex package (v0.3.0)
Related
I have some data that looks like this: https://i.imgur.com/hzEd7bT.png
These will be pro league of legends matches once they occur over the course of the next few months. I filled out a few as examples.
Rows 6-10 are champions that each team banned. Rows 11-15 are champions that each team picked.
Each week has about 10 games and there are 9 weeks.
The B and R at the top are Blue (side) and Red (side) in the game. Blue side always gets first choice of champion and red side always gets last choice.
I want to find the best (or worst) synergizing champions
To clarify what I mean by this, in my screenshot the team with Brand and Yuumi won both times while the team with Aurelion Sol and Azir lost both times.
Optimally, I want to know how many times a 2, 3, 4, or 5 characters were picked and the corresponding winrate.
Edit: I am not sure exactly how the data needs to look in R because I have never done this before, but I made two different versions of inputting it below.
LoLGames <- matrix(c('W','Annie','Ezreal','Yuumi','Camille','Brand',
'L','Nasus', 'Aurelion Sol', 'Azir', 'Blitzcrank', 'Caitlyn',
'L','Nasus', 'Aurelion Sol', 'Blitzcrank', 'Ezreal', 'Camille',
'W','Bard', 'Ashe', 'Yuumi', 'Kogmaw', 'Brand'),
ncol = 6, byrow = TRUE)
colnames(LoLGames) <- c("Result","Champ1","Champ2","Champ3","Champ4","Champ5")
rownames(LoLGames) <- c("Game1","Game2","Game3","Game4")
LoLGames <- as.table(LoLGames)
*Corresponding dput
structure(c("W", "L", "L", "W", "Annie", "Nasus", "Nasus", "Bard",
"Ezreal", "Aurelion Sol", "Aurelion Sol", "Ashe", "Yuumi", "Azir",
"Blitzcrank", "Yuumi", "Camille", "Blitzcrank", "Ezreal", "Kogmaw",
"Brand", "Caitlyn", "Camille", "Brand"), .Dim = c(4L, 6L), .Dimnames = list(
c("Game1", "Game2", "Game3", "Game4"), c("Result", "Champ1",
"Champ2", "Champ3", "Champ4", "Champ5")), class = "table")
Result <- c('W','L','L','W',NA)
G1W <- c('Annie','Ezreal','Yuumi','Camille','Brand')
G1L <- c('Nasus', 'Aurelion Sol', 'Blitzcrank', 'Ezreal', 'Camille')
G2L <- c('Nasus', 'Aurelion Sol', 'Blitzcrank', 'Ezreal', 'Camille')
G2W <- c('Bard', 'Ashe', 'Yuumi', 'Kogmaw', 'Brand')
LoLDf <- data.frame(Result, G1W, G1L, G2L, G2W)
*Corresponding dput
structure(list(Result = structure(c(2L, 1L, 1L, 2L, NA), .Label = c("L",
"W"), class = "factor"), G1W = structure(c(1L, 4L, 5L, 3L, 2L
), .Label = c("Annie", "Brand", "Camille", "Ezreal", "Yuumi"), class = "factor"),
G1L = structure(c(5L, 1L, 2L, 4L, 3L), .Label = c("Aurelion Sol",
"Blitzcrank", "Camille", "Ezreal", "Nasus"), class = "factor"),
G2L = structure(c(5L, 1L, 2L, 4L, 3L), .Label = c("Aurelion Sol",
"Blitzcrank", "Camille", "Ezreal", "Nasus"), class = "factor"),
G2W = structure(c(2L, 1L, 5L, 4L, 3L), .Label = c("Ashe",
"Bard", "Brand", "Kogmaw", "Yuumi"), class = "factor")), class = "data.frame", row.names = c(NA,
-5L))
I'm trying to plot a basic database onto Leaflet. The data is csv with "Artist""Address""lat""Lon"
structure(list(Artist = structure(1:6, .Label = c("50 Cent",
"A Tribe Called Quest", "Aesop Rock", "B.I.G.", "Beastie Boys",
"Big Daddy Kane"), class = "factor"), Address = structure(c(1L,
4L, 2L, 3L, 1L, 1L), .Label = c("", "45 JOHN STREET SUITE 711 NEW YORK, NEW YORK, 10038 ",
"Crown Heights", "Linden Boulevard and S192nd"), class = "factor"),
Lng = c(-73.79191, 73.759296, -74.007918, -73.963667, -73.910054,
-73.941774), lat = c(40.680859, 40.692427, 40.709348, 40.682806,
42.02039, 40.687218), Base.of.Operations = structure(c(4L,
2L, 3L, 1L, 1L, 1L), .Label = c("Brooklyn", "Jamaica", "Lower East-Side",
"South Jamaica"), class = "factor"), Significance = structure(c(3L,
5L, 2L, 4L, 1L, 1L), .Label = c("", "First place he recorded, El-Ps studio Definitive Records.",
"Grew Up", "Home", "Where \"Check the Rhyme\" was Shot"), class = "factor"),
Date.of.first.release = structure(c(3L, 4L, 1L, 6L, 2L, 5L
), .Label = c("11/1/1999", "11/15/1986", "2/6/2003", "4/17/1990",
"6/28/1988", "9/13/1994"), class = "factor"), X = c(NA, NA,
NA, NA, NA, NA)), class = "data.frame", row.names = c(NA,
-6L))
I've tried using it as CSV, tibble, data.frame, and selecting the individual columns, but it responds that the "Lon" col is not an object.
library(dplyr)
library(leaflet)
library(sp)
"example" <- read_csv("database.csv", T)
df <- tibble("example")
map <- leaflet(df) %>%
addTiles() %>%
setView(lat=40.742054, lng=-73.769417, zoom = 9) %>%
addMarkers(df)
map
I tried calling the columns in the addMarkers() and still got an error.
database <- read.csv("map.csv")
db <- tibble(database)
map <- leaflet(db) %>%
addTiles() %>%
setView(lat=40.742054, lng=-73.769417, zoom = 9) %>%
addMarkers(lng = ~Lng, lat = ~lat)
Result
Error in eval(f[[2]], metaData(data), environment(f)) :
object 'Lng' not found
What the what
My original Db was corrupted. I reformatted it to native CSV and it worked fine.
This is not so much a coding as general approach call for help ;-) I prepared a table containing taxonomic information about organisms. But I want to use the "names" of these organisms, so no values or anything where you could compute a distance or clustering with (this is also all the information I have). I just want to use these factors to create a plot that shows the relationship. My data looks like this:
test2<-structure(list(genus = structure(c(4L, 2L, 7L, 8L, 6L, 1L, 3L,
5L, 5L), .Label = c("Aminobacter", "Bradyrhizobium", "Hoeflea",
"Hyphomonas", "Mesorhizobium", "Methylosinus", "Ochrobactrum",
"uncultured"), class = "factor"), family = structure(c(4L, 1L,
2L, 3L, 5L, 6L, 6L, 6L, 6L), .Label = c("Bradyrhizobiaceae",
"Brucellaceae", "Hyphomicrobiaceae", "Hyphomonadaceae", "Methylocystaceae",
"Phyllobacteriaceae"), class = "factor"), order = structure(c(1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Caulobacterales",
"Rhizobiales"), class = "factor"), class = structure(c(1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "Alphaproteobacteria", class = "factor"),
phylum = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "Proteobacteria", class = "factor")), .Names = c("genus",
"family", "order", "class", "phylum"), class = "data.frame", row.names = c(NA,
9L))
is it necessary to set up artificial values to describe a distance between the levels?
Here is an attempt using data.tree library
First create a string variable in the form:
Proteobacteria/Alphaproteobacteria/Caulobacterales/Hyphomonadaceae/Hyphomonas
library(data.tree)
test2$pathString <- with(test2,
paste(phylum,
class,
order,
family,
genus, sep = "/"))
tree_test2 = as.Node(test2)
plot(tree_test2)
many things can be done after like:
Interactive network:
library(networkD3)
test2_Network <- ToDataFrameNetwork(tree_test2, "name")
simpleNetwork(test2_Network)
or graph styled
library(igraph)
plot(as.igraph(tree_test2, directed = TRUE, direction = "climb"))
check out the vignette
using ggplot2:
library(ggraph)
graph = as.igraph(tree_test2, directed = TRUE, direction = "climb")
ggraph(graph, layout = 'kk') +
geom_node_text(aes(label = name))+
geom_edge_link(arrow = arrow(type = "closed", ends = "first",
length = unit(0.20, "inches"),
angle = 15)) +
geom_node_point() +
theme_graph()+
coord_cartesian(xlim = c(-3,3), expand = TRUE)
or perhaps:
ggraph(graph, layout = 'kk') +
geom_node_text(aes(label = name), repel = T)+
geom_edge_link(angle_calc = 'along',
end_cap = circle(3, 'mm'))+
geom_node_point(size = 5) +
theme_graph()+
coord_cartesian(xlim = c(-3,3), expand = TRUE)
This question already has answers here:
Reshaping multiple sets of measurement columns (wide format) into single columns (long format)
(8 answers)
Closed 4 years ago.
I have a data set with a single identifier and five columns that repeat 18 times. I want to restructure the data into long format keeping the first five column headings as the column headings. Below is a sample with just two repeats:
structure(list(Response.ID = 1:2, Task = structure(c(1L, 1L), .Label = "task1", class = "factor"),
Freq = structure(c(1L, 1L), .Label = "Daily", class = "factor"),
Hours = c(3L, 2L), Value = c(10L, 8L), Mood = structure(1:2, .Label = c("Engaged",
"Neutral"), class = "factor"), Task.1 = structure(c(1L, 1L
), .Label = "task2", class = "factor"), Freq.1 = structure(c(1L,
1L), .Label = "Weekly", class = "factor"), Hours.1 = c(4L,
4L), Value.1 = c(10L, 6L), Mood.1 = structure(c(2L, 1L), .Label = c("Neutral",
"Optimistic"), class = "factor")), .Names = c("Response.ID", "Task", "Freq", "Hours", "Value", "Mood", "Task.1", "Freq.1", "Hours.1", "Value.1", "Mood.1"), class = "data.frame", row.names = c(NA, -2L))
I attempted using the melt and patterns functions, which appears to approximate my desired outcome without the desired column headings:
df = melt(df1, id.vars = c("Response.ID"), measure.vars = patterns("^Task", "^Freq","^Hours","^Mood"))
Here is the result:
structure(list(Response.ID = c(1L, 2L, 1L, 2L), variable = structure(c(1L, 1L, 2L, 2L), class = "factor", .Label = c("1", "2")), value1 = c("task1", "task1", "task2", "task2"), value2 = c("Daily", "Daily", "Weekly", "Weekly"), value3 = c(3L, 2L, 4L, 4L), value4 = c("Engaged", "Neutral", "Optimistic", "Neutral")), .Names = c("Response.ID", "variable", "value1", "value2", "value3", "value4"), row.names = c(NA, -4L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x0000000000330788>)
When I tried to specify names with value.name() below I receive an error:
df = melt(df1, id.vars = c("Response.ID"),measure.vars = patterns("^Task", "^Freq","^Hours","^Mood"), value.name=c("Task", "Freq", "Hours", "Value","Mood"))
My desired result would look like this:
structure(list(Response.ID = c(1L, 2L, 1L, 2L), Task = structure(c(1L, 1L, 2L, 2L), .Label = c("task1", "task2"), class = "factor"),
Freq = structure(c(1L, 1L, 2L, 2L), .Label = c("Daily", "Weekly"
), class = "factor"), Hours = c(3L, 2L, 4L, 4L), Value = c(10L,
8L, 10L, 6L), Mood = structure(c(1L, 2L, 3L, 2L), .Label = c("Engaged",
"Neutral", "Optimistic"), class = "factor")), .Names = c("Response.ID", "Task", "Freq", "Hours", "Value", "Mood"), class = "data.frame", row.names = c(NA, -4L))
It looks to me like you embarked on a difficult journey by using melt: this function is well named in the sense that trying to use it will probably melt your brain. Joke aside, the function melt has lots of underlying computations and its use could be inefficient if you have a large dataset.
I would instead solve the problem manually with rbindlist (from the excellent package data.table, which also ships with an optimized version of melt if you really want to use it), to manually concatenates groups of columns. This also preserves the column names:
> rbindlist(lapply(1:2, function(i) df1[,c(1,((i-1)*5+2):((i-1)*5+6))]))
Response.ID Task Freq Hours Value Mood
1: 1 task1 Daily 3 10 Engaged
2: 2 task1 Daily 2 8 Neutral
3: 1 task2 Weekly 4 10 Optimistic
4: 2 task2 Weekly 4 6 Neutral
This works on your example: replace the indices 1:2 by the number of repetitions to make it work with the real dataset (so, lapply(1:18)).
This question already has answers here:
Changing column names of a data frame
(18 answers)
Closed 7 years ago.
If I want to change the name from 2 column to the end , why my command does not work ?
fredTable <- structure(list(Symbol = structure(c(3L, 1L, 4L, 2L, 5L), .Label = c("CASACBM027SBOG",
"FRPACBW027SBOG", "TLAACBM027SBOG", "TOTBKCR", "USNIM"), class = "factor"),
Name = structure(1:5, .Label = c("bankAssets", "bankCash",
"bankCredWk", "bankFFRRPWk", "bankIntMargQtr"), class = "factor"),
Category = structure(c(1L, 1L, 1L, 1L, 1L), .Label = "Banks", class = "factor"),
Country = structure(c(1L, 1L, 1L, 1L, 1L), .Label = "USA", class = "factor"),
Lead = structure(c(1L, 1L, 3L, 3L, 2L), .Label = c("Monthly",
"Quarterly", "Weekly"), class = "factor"), Freq = structure(c(2L,
1L, 3L, 3L, 4L), .Label = c("1947-01-01", "1973-01-01", "1973-01-03",
"1984-01-01"), class = "factor"), Start = structure(c(1L,
1L, 1L, 1L, 1L), .Label = "Current", class = "factor"), End = c(TRUE,
TRUE, TRUE, TRUE, FALSE), SeasAdj = c(FALSE, FALSE, FALSE,
FALSE, TRUE), Percent = structure(c(1L, 1L, 1L, 1L, 1L), .Label = "Fed", class = "factor"),
Source = structure(c(1L, 1L, 1L, 1L, 1L), .Label = "Res", class = "factor"),
Series = structure(c(1L, 1L, 1L, 1L, 2L), .Label = c("Level",
"Ratio"), class = "factor")), .Names = c("Symbol", "Name",
"Category", "Country", "Lead", "Freq", "Start", "End", "SeasAdj",
"Percent", "Source", "Series"), row.names = c("1", "2", "3",
"4", "5"), class = "data.frame")
Then in order to change the second column name to the end I use the following order but does not work
names(fredTable[,-1]) = paste("case", 1:ncol(fredTable[,-1]), sep = "")
or
names(fredTable)[,-1] = paste("case", 1:ncol(fredTable)[,-1], sep = "")
In general how one can change column names of specific columns for example
2 to end, 2 to 7 and etc and set it as the name s/he like
Replace specific column names by subsetting on the outside of the function, not within the names function as in your first attempt:
> names(fredTable)[-1] <- paste("case", 1:ncol(fredTable[,-1]), sep = "")
Explanation
If we save the new names in a vector newnames we can investigate what is going on under the hood with replacement functions.
#These are the names that will replace the old names
newnames <- paste("case", 1:ncol(fredTable[,-1]), sep = "")
We should always replace specific column names with the format:
#The right way to replace the second name only
names(df)[2] <- "newvalue"
#The wrong way
names(df[2]) <- "newvalue"
The problem is that you are attempting to create a new vector of column names then assign the output to the data frame. These two operations are simultaneously completed in the correct replacement.
The right way [Internal]
We can expand the function call with:
#We enter this:
names(fredTable)[-1] <- newnames
#This is carried out on the inside
`names<-`(fredTable, `[<-`(names(fredTable), -1, newnames))
The wrong way [Internal]
The internals of replacement the wrong way are like this:
#Wrong way
names(fredTable[-1]) <- newnames
#Wrong way Internal
`names<-`(fredTable[-1], newnames)
Notice that there is no `[<-` assignment. The subsetted data frame fredTable[-1] does not exist in the global environment so no assignment for `names<-` occurs.