Best way to transform source data? - r

Working in R. But I think this question is universal.
Wall Street Journal visualized a dataset on disease infection rates in the U.S.:
X-axis is year. Y-axis is state.
And shade of red per tile is infection rate intensity for that particular state recorded for that year.
The source dataset being visualized is arranged as follows:
Each row in the dataset corresponds to a single infection rate for a single country in a given year. So, each red tile in the visualization corresponds to a row from the dataset.
But what if the dataset looked like this?:
Now, each row corresponds to a state. And each state/row has multiple infection rates, one for each year recorded. This might match how data is captured in the real world because for each year or day (in the case of coronavirus) you track the infection rate, you can just add a new column (rather than a 50 new rows).
The problem is while this layout is more human-friendly, it's not very R-friendly. We can easily create the tile visualization based on the source dataset arrangement where data is arranged by infection rate, but not so easily if it's arranged by state.
So, finally, my question is — is there an easy way to transform data from the second layout to the first, in Excel?

You can use the transpose function in the free, open-source OpenRefine tool to prepare your data file prior to loading it into R.

Related

Determine the proportion of the data information in r

Suppose i have a plot like the following:
I want to get the portion of the data where the majority (say 90%) of the data lay, for example, i want to isolate the plot into something like:
in which the points lay in the black frame contributes to (90%) of the data.
How can i do this in R?
Edited for comment:
What if i have the following plot:? the majority part probably start from 0.

Discrete vs continuous (dimension vs measure) for dates in tableau

I'm currently learning tableau and was creating a graph to map the unemployment with respect to each month. As the period of time was a measure, I dragged it in and it resulted in aggregation by month (of all years together) and a graph was displayed as such:
Now to plot it monthly for each subsequent year, I checked online and saw the conversion method where the same MONTH(PERIOD) was converted to a measure and it resulted in a plot like this:
Could someone please explain why does this work and how is a measure responsible for displaying data in such a way and not the month dimension. I cannot understand the plots when both the axes are measures.
Thank you
...saw the conversion method where the same MONTH(PERIOD) was converted to a measure.
It's not converted to a measure. Period is still a dimension, just a Continuous one. Measures and dimensions in Tableau can be either Continuous or Discrete. The ability to convert one to another depends on the data type of your field. For example, Dates can be converted between Continuous and Discrete, but Strings can't. The green colour isn't for measures - it's for the Continuous type of either measure or dimension.

How to create bar graph from qualitative data in R?

I have two columns for olympic data showing the name of the team and one showing the type of medal won by the team. I am trying to calculate the number of medals won by each team in the olympics and then plot a bar graph from the data.
I do not know how to best go about it. The observations for teams repeat as they were extracted from a bigger data set as they show each participants nationality. I want to calculate for each team. However, there are mutiple teams and thus want to do it for the top 10 teams. I thus would also need assistance in ranking them.
Appreciation in this regard will be appreciated.

How to show actual values instead of sum/average on y-axis in a plot using Power BI

I have an excel sheet which I am plotting two values over time. But when I plot it in Power BI it shows the sum of those values in y-axis rather than the actual value. The plot looks like the following.
If you see the y-axis they are in billions. But if you see the actual data below its in hundred thousands range.
If I use date hierarchy instead this is how it looks like.
Since your data is on specific days, you should use dates on your axis rather than weekends. When you have weekends on the axis, it groups all the days within that week together, which is not what you want.
Edit:
After looking at your file, it has nothing to do with dates after all. It looks like the data is recorded at a weekly level. The reason its adding up like that though is that for each date there are a bunch of different rows that correspond to different geography (your geo column has state name abbreviations).
If you add a slicer for geo and look at the chart for a single state at a time, then I think you'll get more what you're expecting.

Plotting multiple lines in Tableau

I read every possible forum and I couldn't find a specific answer. I'm new to Tableau and I need to perform what I thought would be a very simple task but I can't figure it out.
I need to create a chart with multiple lines plotted in the same graph. On one column, I have a timestamps in seconds (decimal). For each timestamp value, I have 4 columns associated (Temperature, Pressure, Humidity, Voltage) and I need to visualize how they trend over time. The data in Excel looks something like this (I simplified it for visualization purpose):
In Excel, it takes me less than seconds to obtain a chart that looks like this:
How can I replicate the same exact chart in Tableau? I would like someone to tell me exactly how each column should be formatted, if the data has to be a dimension or a measure and data type for each (string, number etc) and what steps to take in the chart. I would do this in Excel but the file is almost 1M rows and Excel keeps crashing.
drag & drop measure values & timestamp.
EDIT - first make Timestamp a dimension instead of a measure, and make sure it remains continuous instead of discrete. So you should not see the word SUM in front of Timestamp
remove no. of row & timestamp from the measure values.
uncheck aggregate measure.
EDIT this step is not usually needed or desirable. If you do have multiple recordings per Timestamp, you will want to select the best aggregation function for each measure, perhaps AVG or MEDIAN instead of SUM
select line from mark and drag measurement to color
If your X-axis is a Date Column, (i.e. You have a list of dates on your X-axis)
Drag your 'date column' to Columns and choose Month/Year/Week etc. and make it continuous
Drag 'Measure Names' to Filters and choose your Measures
Drag 'Measure Values' to Rows
Drag 'Measure Names' to the Colors section of the Marks card

Resources