Recent comments in /f/dataisbeautiful

ianhillmedia t1_j6db30j wrote

Got it thanks for the reply! I know not everyone supports RSS, and it’s a challenge when folks format RSS in different ways, but as they’re a primary source from the publisher I’d encourage you to use RSS over APIs from Google.

I was curious the signals in your algorithm as well. One of the challenges with automating taxonomies for news stories is the inexactitude of language and differences in style. A story might mention DeSantis and books in the headline and description but might actually be about GOP primaries; a story might emphasize DeSantis in the primaries in the headline and title but it might actually be about book banning.

Or a better example: a story that mentions Tyre Nichols may be about the actual incident, police violence or defunding the police.

Digging in even further, a local news organization might use colloquialisms for place names that can make it difficult for folks from outside that market to categorize those stories.

2

JackdiQuadri97 t1_j6da7et wrote

Really bad division between regions, different scale in graphs makes much harder comparison between different regions.

Also the title seems to imply causation and let's be honest, the amount of pharmacists has a pretty low impact on life expectancy, the just handle out simple medicines or what the doctor prescribed, having doctors per capita would definitely be a much better comparison

15

Enola_Gay_B29 t1_j6d756m wrote

Do you have a link to that data, too? Because the one you posted in another comment (https://www.theglobaleconomy.com/rankings/pharmacists_per_1000_people/) doesn't correlat to anything shown in your graphs at all. There is only data from 2 Asian countries (South Korea and Israel), 20 European ones, as well as Canada and Australia and New Zealand. And the values range from 0.2 to 1.3 not in the double digits like in your graphs.

3

PartisanPlayground OP t1_j6d6ebz wrote

I'm getting the data from the Google News API. I've used RSS feeds in the past with similar results.

And actually I'm using a clustering algorithm to identify the specific stories. I have an automated process that pulls all articles from the past five days, clusters them into stories, then produces a bunch of analysis. This saves me a lot of time and brings some objectivity to the process.

1

tuctrohs t1_j6d4y65 wrote

I seem to be having trouble communicating here. Absolutely, dollar cost averaging is the way to go. The other stuff I'm suggesting, I'm suggesting because it's worse, and if you want to show how good dollar cost averaging is, you need to compare it to something that is worse, but that somebody naive might do. I'm suggesting these for your plot, not for your investing strategy. For investing strategy, dollar cost averaging is the way to go, unless you are a genius or you have a better understanding of the particular market and company, then is available to expert investors.

3

Enola_Gay_B29 t1_j6d46rd wrote

What kind of categories are that?

You divide Asia up between Eastern Meditteranean (btw. Morrocco is as west as you can get in the mediterranean (even more than Algeria, who you put into Africa), but ok), South East Asia (why tf is one of the Koreas in that category?) and Western Pacific, but then lump Uzbekistan and Turkmenistan in with Europe? Also, how tf is Cambodia Western Pacific, but Indonesia and Timor-Leste are not? Have you ever looked at a map?

2

ianhillmedia t1_j6d3dqb wrote

Happy to help! And I think you’re spot on when you say you need to clarify the definition of prevalence. Just because a news org puts resources into a topic doesn’t mean it’s prevalent to the user. That said, the number of stories a news org efforts on a subject is an interesting data point.

As someone on the other side of this, I hear you on the challenges associated with getting useful data. How are you currently tracking all articles published by those news orgs? And how are you parsing that data to identify specific stories - what search terms are you using to filter the data?

2

the_scign t1_j6d2no9 wrote

Number is the percentage chance of the next letter being the column letter, assuming you're already at the row letter. I.e. if you are at a "Q", there's a 95% chance the next letter will be a "U". This says nothing about the "popularity" of the letter combination by itself.

"A" as a next letter follows most consonants. Not necessarily the second letter in the name.

2

PartisanPlayground OP t1_j6d2gwe wrote

This is an excellent comment, thank you for this!

I think I need a clearer way of describing "prevalence". This chart is showing the top ten stories by the share of articles written about them, not by the amount that they are consumed. I take articles from 64 sources on every day, cluster them together into "stories", then calculate each story's share based on the number of articles written about it. For example, if there are 1000 articles for a day, and one story has 100 articles written about it, then its share is 10%. Does that make sense?

I've explored measuring consumption of news in the past, and found it to be very difficult! (Facebook's Graph API used to be wide open, so I was able to get likes/engagement on news stories there, but it has since been locked down) Your comment does a great job of explaining the complexity in measuring consumption. You would need to combine:

- GA data from news outlets (which they don't publish)

- Cable news data (sources exist for this, but you would need to make a lot of assumptions to combine this with articles)

- Social media data

And you would need to make a lot of assumptions about what weights to use on each of those. As a result, I'm keeping this simple and focusing on article shares.

I do publish a daily automated Twitter thread on which news outlet gets the most engagement on Twitter. It includes the most liked and ratioed tweets from each "side" of the media. This is limited to Twitter, so does not cover all the channels you described. See an example here: https://twitter.com/PartisanPlayG/status/1619300675094970369

The other thing I've been doing is cutting articles by which "side" of the media they're on using media bias ratings from AllSides. Again, this involves some simplifying assumptions so it's not perfect but gives a good high-level view. You can see examples here: https://partisanplayground.substack.com

Thanks again for your comment. This is exactly the sort of thing I was looking for when I posted.

2

comicmuse1982 t1_j6d15gm wrote

Nope. You are wrong. They are called zoomers because they go from A to Z so quickly... They are already at Z.

Always on the move, zipping and zapping, zigging and zagging. Zoomers.... Faster than a Yoomer.

The burger flipper can buy a house if they go freelance and get paid per flipped burger. They'd zoom through it and increase their earnings significantly. Or they could set up a spatula subscription service where people will receive a spatula whenever they need to flip a burger. A corn starch plastic disposable spatula.

2

Datapunkt t1_j6d0ju6 wrote

You were thinking about this a lot but never thought that their might be regulations on how a car can look and function like? Just for starters, not like their are hundreds of other reasons why a team shouldn't do it like.... trying to get as many points as possible and not relying on the fact that 10 cars need to DNF in order to get into the points.

1