Recent comments in /f/dataisbeautiful

Chramir t1_j64xwtj wrote

They made a estimate of how many words are there in every youtube video uploaded. That estimate is calculated by the total runtime of all the videos multiplied by average word count in a conversation per given time. And the total words are devided by the number of words in a average book. To get a 'books size'.

I don't know, but that just seems kinda iffy. First youtube videos are rarely a back and forth conversation. And secondly it's like pointing to a skyscraper and saying it's like a big sandcastle because sand is used in concrete.

Edit: grammar and added the 'word count' estimate explanation.

1

LiverOfStyx t1_j64sdsj wrote

Not even close to accurate. I know one road that is missing and how perfect it would be for rallying. My family owns a share of that road and government pays for the upkeeping and maintenance cause it is quite neat way to shortcut between two highways and can handle a tank... The whole area is a black void in this map when i know it is fully of backroads that are in excellent condition. They cut thru marshlands that form natural obstacles, unless you use those roads...

2

malachai926 t1_j64oqam wrote

To be frank, it's just poor presentation. Statisticians like myself will see lots of problems with this. If I am confused, I guarantee that the layperson will be even more so.

>red = Rep states in 2020 election

blue = Dem states in 2020 election

Even here, you aren't being clear enough. Are they "republican" because their votes for president in the 2020 election were majority in favor of the Republican candidate? Republican because they elected more Republican House congresspeople / senators? I can infer that you're likely referring to the electoral college result, but when people have to infer what you mean with your data, that's just bad practice that is bound to get you in trouble in the future.

>t-tests are usually reported using the p-value

Not always, no. A lot of published research will tell you both the t-statistic AND the p-value. If you're giving us a p-value, you should say it's a p-value, end of story.

>the t-test is sensitive to small mean variations: the top right plot shows the means separated by a SD, which is NOT a small difference ( t-test = 0.000015).

That's great, but why didn't you state that result in the graph? And again, don't say "t-test equals", at least say "t-test p-value equals". It's nonsense to say that a test equals something. The test generates a statistic and a p-value which equal something, but the test itself is a test. It pays to be explicit with what you are saying, or else other statisticians could misinterpret what you are saying. In this case, if someone thought you meant the t-statistic was 0.000015, that would mean the results were highly non-significant and would think you screwed up your calculation.

You seem to have some idea in your mind of how things are "typically" interpreted by various groups of people, but you should NOT rely on those assumptions because inevitably someone will interpret gray area in a way you didn't intend. It is always far, far preferable to be as explicit as you can with your definitions of things.

Again I think showing this as a sorted scatterplot is just weird. You really ought to show this data as a histogram. You're using a t-test, yeah? So it's really incumbent on you to demonstrate that the data really does follow the shape of a t-distribution to prove to your audience that such a test is acceptable. A histogram achieves that; this scatterplot does not.

Finally, maybe it's just me, but grouping these things together on a state level just feels like you're losing so much detail and misclassifying so much data that I really question the validity of your results. Maybe this is the best you have to work with, but you are classifying a state that went 51% in favor of the Democrat as 100% Democratic and vice versa, which then classifies every single school district in that state, including the likely numerous rural school districts where people are more likely to be conservative, as "Democratic" school districts contributing however much money they contributed towards education. You'd get a lot more robust data and far less of this kind of error if you were able to get this data by school district. If you don't have that data, it is what it is, but the end result is that I'll consider everything I said here and think "eh, this is kinda just bad analysis and is meaningless" and it gets disregarded. And I imagine you wouldn't want the analysis you spent all of this time and effort on to be disregarded, yeah?

2

latinometrics OP t1_j64npvc wrote

From our newsletter:

Is Puerto Rico the music capital of the world? 🇵🇷

Puerto Rico, with 3.3M people or 0.4% of LatAm's population, is the birthplace of 6 of the region's top 10 most streamed artists on Spotify. Many of them are also top artists worldwide.

There is no way such a stat is a product of chance.

There must be an incredible force behind the success of so many artists from a tiny island roughly the size of Connecticut, the US's 3rd smallest state.

For over a hundred years, the island has been the motherland of original music genres:

• Bomba by enslaved Africans

• Plena by Jíbaros (native farmers)

• Danza (adapted from Europe's contradanza)

• and more recently, Reggaeton and Latin trap

The US territory, the only one that maintains Spanish as the official language, has one of the world's highest concentrations of music stars per capita (perhaps the highest).

When looking at Spotify streams, singers like Ricky Martin or Chayanne are at a disadvantage because they became big well before Spotify existed, so they do not appear on our chart.

However, the last 20 or so years have brought about a new era of rappers like Residente and reggaeton superstars, best exemplified by Bad Bunny, currently the most streamed artist on the planet for three years in a row.

Bad Bunny was inspired by “the King of reggaeton,” Daddy Yankee, who is 4th on the list despite also having somewhat of a disadvantage. His iconic song, Gasolina, came out in 2004, when your writer still burned custom CDs and used a Walkman.

Gasolina was listed as #50 by Rolling Stone's 500 Greatest Songs of All Time, and there's absolutely no way you haven’t heard it before.

Colombian J Balvin is number two on the list and has more streams than Dua Lipa and Taylor Swift. Behind every great artist, there's a great producer.

In J Balvin's case, that person is “Sky Rompiendo,” who is responsible for some of J Balvin's greatest hits and collaborations like Safari with Pharell Williams. Sky has also produced songs for Ozuna and Maluma, also top 10 artists, and many other Latin stars.

So, undoubtedly, Puerto Rico and Colombia LatAm's music capitals. The big inexplicable question is: why are there 0 artists from Brazil and Mexico in Latin America's top 10? 🇲🇽🇧🇷

2

Jrubas t1_j64nlyo wrote

Fox News is a lying propaganda machine too. None of these media outlets is objective. They're all biased one way or another and staffed by lying scumbags. And that's just the news team. The pundits are actively evil and work night and day to continue dividing us. If there's a wound on the body politick, you can count on these assholes to rub dirt and broken glass in it.

1