Recent comments in /f/dataisbeautiful
CrabHerdPeril t1_j6fnwuf wrote
Reply to [OC] The Big Three - Grand Slams overview by twintig5
You need to use different colors for the top and bottom section. Took me a full 60 seconds to understand anything.
TisButA-Zucc t1_j6fjeub wrote
Reply to comment by Sisyphuss5MinBreak in [OC] Where are Redditors from? Source link in comments by iFoegot
Are desktops really a sign of a more developed nation compared to mobiles? Don't underestimate the concept of internet cafés, or the amount of Microsoft offices in India, for example. I would argue desktops have a bigger budget range compared to smartphones, they can be incredibly cheap.
JPAnalyst t1_j6fj559 wrote
Reply to Guess which one is JJJ... [OC] by michaelGaryScarn009
What’s the R-squared without JJJ?
Great post BTW
michaelGaryScarn009 OP t1_j6fimqx wrote
Reply to Guess which one is JJJ... [OC] by michaelGaryScarn009
What
A well-researched Reddit post identified a statistical anomaly in the home vs away steals and rebounds for DPOY favorite Jaren Jackson Jr. (JJJ). (link)
Why
It's several things I like (basketball, variance, Reddit, etc) wrapped up in an opportunity to make a data viz.
Graph
- Per 36 min steals + blocks (stocks)
- Size of point = absolute difference of home and away average
Conclusion
Just looking at the viz, yea, this is exceptional behavior. Others have determined it's a sub 1% chance that this variation could be random. The NBA has clarified that an NBA representative, and not the local team, makes the official stat determinations. NBA media has reviewed the calls for JJJ, and has only found several questionable calls. But, we're just looking at data.
Data
- From NBA.com on 01-29-2023
- Top 50 players in per 36 min steals and blocks
-- Home and away
-- At least 8 games played for home and away
TisButA-Zucc t1_j6fi57a wrote
So, officially, Americans on reddit can't assume that their fellow redditors are also American?Neat.
tron_oce t1_j6feq2c wrote
Reply to [OC] The Big Three - Grand Slams overview by twintig5
Should have a line for non big 3 winners over this time period
0tt0attack t1_j6fbrz7 wrote
This does not show anything beside population numbers.
myrianthe t1_j6faxsp wrote
Reply to comment by MrLagzy in Transition probabilities (shown as percentages) between successive letters in the names of girls born in 2021 in the USA [OC] by kilopeter
Fascinating. I wonder how they’re pronounced?
dessertandcheese t1_j6f9ddq wrote
Reply to comment by Specialist_Cow2011 in Portugal Population Density Map, done with R using the Rayshader package. [OC] by Specialist_Cow2011
Hi! I'm planning to do that as well. Did you find it useful and is it easy enough if you have no background?
gechu t1_j6f9ayd wrote
Reply to [OC] The Big Three - Grand Slams overview by twintig5
It would seem the US Open is the hardest venue to dominate consistently
[deleted] t1_j6f4gno wrote
Reply to comment by twintig5 in [OC] The Big Three - Grand Slams overview by twintig5
[removed]
twintig5 OP t1_j6f3f49 wrote
Reply to comment by qwerty6731 in [OC] The Big Three - Grand Slams overview by twintig5
Nice idea, thanks for the feedback. Will do something like that next time.
qwerty6731 t1_j6f35mb wrote
Reply to [OC] The Big Three - Grand Slams overview by twintig5
The double use of the same colours is confusing. Also, the legend for the lines is too small and out of the way.
Suggestion: Keep the three colours for the players, enlarge the legend and bring it closer to the lines, then change the bottom win distributions to monochrome shades by tournament, based on the colour associated with each player.
[deleted] t1_j6f2b9u wrote
Reply to comment by rockstoagunfight in [OC] The Big Three - Grand Slams overview by twintig5
[removed]
AvailableUsername404 t1_j6f27h4 wrote
Reply to [OC] The Big Three - Grand Slams overview by twintig5
When three of them have won cumulative 64 grand slam tournaments in 20 years but none of them have The Grand Slam.
Fun fact - last man that achieved The Grand Slam in single's game was in 1969.
[deleted] t1_j6f1ww1 wrote
Reply to [OC] The Big Three - Grand Slams overview by twintig5
[removed]
rockstoagunfight t1_j6ezr04 wrote
Reply to [OC] The Big Three - Grand Slams overview by twintig5
Personally I don't like that the 2 charts use the same greenish and blueish colours for both charts. I get that it would look less cohesive with 7 colours though.
AnonymousShitposter6 t1_j6ewc6p wrote
I personally prefer John 1:14
gnomeba t1_j6etjv4 wrote
Reply to Transition probabilities (shown as percentages) between successive letters in the names of girls born in 2021 in the USA [OC] by kilopeter
This is great. Are the probabilities conditioned on the total distribution of the letters? If not, this might make a more realistic fake name generator.
kilopeter OP t1_j6eq2tj wrote
Reply to comment by ghostfaceschiller in Transition probabilities (shown as percentages) between successive letters in the names of girls born in 2021 in the USA [OC] by kilopeter
Oh nice, I'll let you know!
PartisanPlayground OP t1_j6eo5hz wrote
Reply to comment by ianhillmedia in [OC] How news stories evolve in the news cycle by PartisanPlayground
You're hitting on the most subjective part of this whole process. I've run into all of the issues you describe, and the question is ultimately: how do you define a story?
Your GOP primaries example is a good one. Let's say we have articles on Trump's legal issues, other articles on Pence's classified documents, and other articles on DeSantis and books. Now let's say all of these articles describe these things in the context of the 2024 GOP primaries. Is this one story called "GOP primaries"? Or three separate stories? You could make a case either way.
I've tuned the algorithm to split stories in a way that "looks about right" to me. That's subjective, but there's no way around it. This is an issue whether you're using an algorithm or doing this manually.
A related challenge is that story definitions may change over time. The classified documents story is a good example for this. Right now there are articles on Trump, Biden, and Pence all mishandling classified documents. The algorithm is categorizing all of them as the same story (fair enough).
But let's say that next week (just making this up), Trump gets indicted for it. Is that a separate story now? If so, how do you treat that? Do you retroactively split out the "Trump" portion of the "classified documents" story as though they were not the same story before? Do you show the classified documents story splitting into two? Do you just create a new story on the day the indictment happens? Currently, the algorithm is set up to do the first of these, but again, you could make a case for any of them.
All of this is to say that there is subjectivity involved in this process.
wintherwheels t1_j6emebf wrote
Reply to [OC] Longer Living Through Pharmacology: Life Expectancy vs. Pharmacists Per Capita, by Country by whatweshouldcallyou
I thought this was all one chart. I was zoomed in on mobile scrolling around trying to figure out what the hell was wrong the axis. Hurt my brain.
ghostfaceschiller t1_j6ej1pw wrote
Reply to comment by kilopeter in Transition probabilities (shown as percentages) between successive letters in the names of girls born in 2021 in the USA [OC] by kilopeter
I recently put together a repo of character frequency analyses, bc they can be really useful when designing keyboard layouts. So I have an eye out rn for interesting ways to look at and visualize the data. I think this particular instance is probably too limited to be useful for keyboard layouts, but if you do anything more please let me know! It’s one of the more interesting visualizations I’ve seen so I’d love to include/link it
kilopeter OP t1_j6ecjw9 wrote
Reply to comment by ghostfaceschiller in Transition probabilities (shown as percentages) between successive letters in the names of girls born in 2021 in the USA [OC] by kilopeter
I haven't, but good point. The code to count transitions between characters is very straightforward (well... I wrote mine without worrying about performance issues), and in principle could be packaged as a lightweight web app or even a JavaScript-powered static site and accept any text corpus uploaded or linked by the user.
myceliyumyum t1_j6fo95i wrote
Reply to [OC] Where are Redditors from? Source link in comments by iFoegot
Only showing desktop users makes the data basically useless.