problems with Scatterplot - r

enter image description hereI'm trying to visualize correlation between two columns in my dataset.
I tried to use plot(), scatterplot, but the result is not a readable graph.
For example I used this function:
scatter.smooth(x=Lifestyles$SLEEP_HOURS, y=Lifestyles$SUFFICIENT_INCOME, main="sleep hours and Income", xlab = "Sleep hours", ylab = "income, 1,2")
About dataset.
I have about 12000 observations and 20 columns.
both columns are as.numeric and integer.
here I'm trying to observe number of sleep hours and how many tasks completed daily
my link to my dataset: https://www.kaggle.com/ydalat/lifestyle-and-wellbeing-data
Thank you all in advance!

Related

Counts, bars, bins for each pandas DataFrame histogram subplot

I am making separate histograms of travel distance per departure hour. However, for making further calculations I'd like to have the value of each bin in a histogram, for all histograms.
Up until now, I have the following:
df['Distance'].hist(by=df['Departuretime'], color = 'red',
edgecolor = 'black',figsize=(15,15),sharex=True,density=True)
This creates in my case a figure with 21 small histograms.
With single histograms, I'd paste counts, bins, bars = in front of the entire line and the variable counts would contain the data I was looking for, however, in this case it does not work.
Ideally I'd like a dataframe or list of some sort for each histogram, containing the density values of the bins. I hope someone can help me out! Thanks in advance!
Edit:
Data I'm using, about 2500 columns of this, Distance is float64, the Departuretime is str
Histogram output I'm receiving
Of all these histograms I want to know the y-axis value of each bar, preferably in a dataframe with the distance binning as rows and the hours as columns
By using the 'cut' function you can withdraw the requested data directly from your dataframe, instead of from the graph. This is less error-sensitive.
df['DistanceBin'] = pd.cut(df['Distance'], bins=10)
Then, you can use pivot_table to obtain a table with the counts for each combination of DistanceBin and Departuretime as rows and columns respectively as you asked.
df.pivot_table(index='DistanceBin', columns='Departuretime', aggfunc='count')

How to plot three barplots over time on the same axis

I have a data frame (df) that contains time (five min intervals) and number of animals that passes by at three different locations. Countings starts and ends at the same time for all three places. They are passing by the three places at different times; first place1, second place2 and third place3 due to their alignment in space.
Today I get:
Number of animals on the y-axis and time on the x-axis (where there are three separate groupings like I want). However, time starts over again for each of the three groupings (thus the x-axis looks like this (start-end, start-end, start-end).
What I want:
A plot with number of counts on the y-axis, and time on the x-axis (from start to end of counting). Thus there will be a "displacement" of number of animals (possible fewer as well) as time goes by.
My code today is:
barplot(df$nr.animals, names.arg = df$time, xlab="Time", ylab="Number of animals counted")
How can I get this data into a barplot with time on the x-axis (without the time starting at the beginning again for each new place)?
EDITED:
see picture for some of my dataset:
time=time, pos has three different values: mmys1, mmys_c1 and mmys_c2, which stands for the three different places, nr.bats.dir1 = number of animals counted in that specific time interval
see picture of the structure of my dataset:
time has factor format, nr,bats.dir1 has integer format
thus my code is actually like this
barplot(df$nr.bats.dir1, names.arg = df$time, xlab="Time", ylab="Number of bats counted")
Thanks!

Creating vectors of equal length from R dataset and plotting them

R has some built in datasets, namely I'm using "lynx" and "LakeHuron" which are of different lengths. "lynx" contains data on annual lynx trappings from 1821-1934, and "LakeHuron" contains annual water level from 1872-1975.
I need to plot LakeHuron data on the y-axis, and lynx data on the x-axis, but only for the years 1875-1934 inclusive. I created two vectors:
lynx.years = c(lynx)
huron.years = c(LakeHuron)
I am stuck at the point of trying to only make a plot for the specified year range. Can someone help me figure out how to plot the data from the two vectors for only the years 1875-1934?
Thank you!
1) The question did not specify what sort of plot was desired so assume it is a two panel plot with one series in each panel with years on the X axis such that only the range of years mentioned is shown. That range of years is the intersection of the years of the two series so:
plot(na.omit(cbind(LakeHuron, lynx)))
Drop the na.omit if you want to plot the entirety of the two series.
2) If what is wanted is to rescale the two series so that their shapes can be shown on a single panel despite vastly different ranges:
ts.plot(scale(na.omit(cbind(LakeHuron, lynx))), col = 1:2)
Again, we could drop the na.omit if the entire series were desired.
3) If what is wanted is to plot one vs. the other then:
plot(unclass(cbind(LakeHuron, lynx)))

R: plot multiple lines in different colours from subset of database

I've created a database with six different countries and multiple GDP and inequality measures.
For starters, I want to plot the GDP growth of the countries in one plot. This works out perfectly fine:
plot(my_six_countries$Year, my_six_countries$GDP.growth.rate, main = "Development of GDP growth", xlab = "Year", ylab = "GDP growth", type = "l", col = 600)
However, I want the lines for the different countries to be displayed in different colours and not just 600. I virtually spend the whole day on this super nooby problem and I've tried all sort of things from creating a colour vector over subsetting manually to playing with ggplot - but I'm really stuck.
Any idea how the lines could be displayed in different colours?
Thank you so much!
I just wanted to say that I ended up using a way less elegant method - but it worked.
Firstly, I subsetted my countries.
c1 <- subset(countries,countries$Country=="c1")
c2 <- subset(countries,countries$Country=="c2")
c3 <- subset(countries,countries$Country=="c3")
Secondly, I plotted the lines one by one.
plot(c1$Year, c1$GDP, type = "l", bty="l", col="brown")
lines(c2$Year, c2$GDP, col="cornflowerblue")
lines(c3$Year, c3$GDP, col="darkblue")

Plot boxplots and line of time series data in R

I want to combine a time series of in situ values (line) with boxplots of estimated values of special dates. I tried to understand this "Add a line from different result to boxplot graph in ggplot2" question, but my dates make me drive crazy. Sometimes I only have in situ values of a date, sometimes only estimated values and sometimes both together.
I uploaded a sample of my data here:
http://www.file-upload.net/download-9942494/estimated.txt.html
http://www.file-upload.net/download-9942495/insitu.txt.html
How can I create a plot with both data sets that looks like this http://www.file-upload.net/download-9942496/desired_outputplot.png.html
in the end?
I got help and have a solution now:
insitu <- read.table("insitu.txt",header=TRUE,colClasses=c("Date","numeric"))
est <- read.table("estimated.txt",header=TRUE,colClasses=c("Date","numeric"))
insitu.plot <- xyplot(insitu~date_fname,data=insitu,type="l",
panel=function(x,y,...){panel.grid(); panel.xyplot(x,y,...)},xlab=list(label="Date",cex=2))
est.plot <- xyplot(estimated~date,data=est,panel=panel.bwplot,horizontal=FALSE)
both <- insitu.plot+est.plot
update(both,xlim=range(c(est$date,insitu$date_fname))+c(-1,1),ylim=range(c(est$estimated,insitu$insitu)))

Resources