Recent comments in /f/deeplearning

dualmindblade t1_ivbs4ra wrote

By the other side, I meant the other side of the board, but let's explore your ideas a bit in the context of board game algorithms. In the case of the AlphaZero algorithm, the other opponent is itself. The neural network part of alpha zero acts as a sort of intuition engine, and it's trying to intuit 2 related but actually different things, 1 the value of a particular move, how good or bad it is, 2 which move AlphaZero itself will be likely to choose after it has thought about it for a long time. By thinking, I mean running many many simulated games from the current position, making random moves probabilistically weighted by intuition 1. This is the novel idea of the algorithm, and it allows it to drastically magnify the amount of data used to train the neural network. Instead of having to play an entire game to get 1 tiny bit of feedback it gets it for every possible move every turn, the network weights are updated based on how well it predicts its own behavior. There's growing evidence that animal brains do something similar, this is called the predictive processing model of cognition. Anyway, I want to point out that this very much seems like a theory of mind, except it's a theory not of another mind but if its own. BTW, AlphaZero becomes, after training, ridiculously good not only at predicting its own behavior but at predicting the value of a move. The go playing version can beat all but the very best professional players without doing any tree search whatsoever, in other words making moves using only a single pass along the NN part of the architecture (the intuition) and not looking even one move ahead, likewise it is remarkably accurate, though not perfectly so, at predicting its final decision after searching the game tree, so its conception of self is accurate.

Now there's another game playing engine called Maia, this is designed not to beat humans but to play like they do, and it's quite good at this. It can imitate play of very good amateurs all the way up to professionals. There's absolutely no reason this couldn't be integrated into the AlphaZero algorithm, providing it with not only a theory of its own mind but that of a (generic) human player. And if you don't like that generic part, there are engines fine tuned on single humans, usually professional players with a lot of games in the database. So basically, yes they are stimulus react models, always they will be, but they're complicated ones where the majority of the stimulus is generated internally, and probably so are humans. And they are capable even today of having a theory of mind by any reasonable definition of what that means.

1

sckuzzle t1_ivbjs9d wrote

If I were to approach this, I'd train them at the same time. You have two models - one for each side - each with their own reward functions. Then you'd train them in parallel, playing against each other as they go.

It's a bit of a challenge because you can only train them relative to the strength of the other - so you need them both to get "smarter" in order to continue their training. But that's no different than a model that self-trains against itself.

2

sckuzzle t1_ivbiskk wrote

> so it must understand both strategies about equally regardless of which side it's playing

What do you mean here by "understand"? My understanding is that the state-of-the-art AI has no concept of what the capabilities of its opponent are or even what its opponent might be thinking; it only understands how to react in order to maximize a score.

So while you could train it to react well no matter which side it is playing, how would it benefit from being able to play the other side better? It would need to spin up a duplicate of itself to play the other side and then analyze itself to understand what is happening, but then it would just get into an infinite loop as it's duplicate self spins up its own duplicate.

I guess what I'm getting at is that these AI algorithms have no theory of mind. They are simple stimulus-react models. Even the concept of an opposing player is beyond them - it'd be the same whether it was playing solitaire or chess.

1

InfuriatinglyOpaque t1_ivb9otw wrote

In terms of papers using relevant tasks, the closest example I can think of might be the "hide and seek" paradigm used by Weihs et al. (2019), which I include in my list below (not my area of expertise - so there could easily be far more relevant papers out there that I'm not aware of). I wouldn't be the least bit surprised though if there were lots of relevant ideas that you could take from prior works using symmetric games as well, so I've also included a wide variety of other papers that all fall under the broad umbrella of modeling game-learning/game-behavior.

References

Aggarwal, P., & Dutt, V. (2020). Role of information about opponent’s actions and intrusion-detection alerts on cyber-decisions in cybersecurity games. Cyber Security: https://www.ingentaconnect.com/content/hsp/jcs/2020/00000003/00000004/art00008

Amiranashvili, A., Dorka, N., Burgard, W., Koltun, V., & Brox, T. (2020). Scaling Imitation Learning in Minecraft. http://arxiv.org/abs/2007.02701

Bramlage, L., & Cortese, A. (2021). Generalized Attention-Weighted Reinforcement Learning. Neural Networks. https://doi.org/10.1016/j.neunet.2021.09.023

Frey, S., & Goldstone, R. L. (2013). Cyclic Game Dynamics Driven by Iterated Reasoning. PLoS ONE, 8(2), e56416. https://doi.org/10.1371/journal.pone.0056416

Guennouni, I., & Speekenbrink, M. (n.d.). Transfer of Learned Opponent Models in Zero Sum Games. 8. Hawkins, R. D., Frank, M. C., & Goodman, N. D. (2020). Characterizing the dynamics of learning in repeated reference games. Cognitive Science, 44(6), e12845. http://arxiv.org/abs/1912.07199

Kumaran, V., Mott, B. W., & Lester, J. C. (2019.). Generating Game Levels for Multiple Distinct Games with a Common Latent Space. 7. https://ojs.aaai.org/index.php/AIIDE/article/view/7418

Lampinen, A. K., & McClelland, J. L. (2020). Transforming task representations to perform novel tasks. Proceedings of the National Academy of Sciences, 117(52), 32970–32981. https://doi.org/10.1073/pnas.2008852117

Lensberg, T., & Schenk-Hoppé, K. R. (2021). Cold play: Learning across bimatrix games. Journal of Economic Behavior & Organization, 185, 419–441. https://doi.org/10.1016/j.jebo.2021.02.027

Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, D., Bachman, P., & Courville, A. (2021). Pretraining Representations for Data-Efficient Reinforcement Learning. http://arxiv.org/abs/2106.04799

Sibert, C., Gray, W. D., & Lindstedt, J. K. (2017). Interrogating Feature Learning Models to Discover Insights Into the Development of Human Expertise in a Real-Time, Dynamic Decision-Making Task. Topics in Cognitive Science, 9(2), 374–394. https://doi.org/10.1111/tops.12225

Spiliopoulos, L. (2013). Beyond fictitious play beliefs: Incorporating pattern recognition and similarity matching. Games and Economic Behavior, 81, 69–85. https://doi.org/10.1016/j.geb.2013.04.005

Spiliopoulos, L. (2015). Transfer of conflict and cooperation from experienced games to new games: A connectionist model of learning. Frontiers in Neuroscience, 9. https://doi.org/10.3389/fnins.2015.00102

Stanić, A., Tang, Y., Ha, D., & Schmidhuber, J. (2022). Learning to Generalize with Object-centric Agents in the Open World Survival Game Crafter (No. arXiv:2208.03374). arXiv. http://arxiv.org/abs/2208.03374

Tsividis, P. A., Loula, J., Burga, J., Foss, N., Campero, A., Pouncy, T., Gershman, S. J., & Tenenbaum, J. B. (2021). Human-Level Reinforcement Learning through Theory-Based Modeling, Exploration, and Planning. http://arxiv.org/abs/2107.12544

Weihs, L., Kembhavi, A., Han, W., Herrasti, A., Kolve, E., Schwenk, D., Mottaghi, R., & Farhadi, A. (2019). Artificial Agents Learn Flexible Visual Representations by Playing a Hiding Game. http://arxiv.org/abs/1912.08195

Zheng, Z. (Sam)., Lin, X. (Daisy)., Topping, J., & Ma, W. J. (2022). Comparing Machine and Human Learning in a Planning Task of Intermediate Complexity. Proceedings of the Annual Meeting of the Cognitive Science Society, 44(44). https://escholarship.org/uc/item/8wm748d8

3

arhetorical t1_ivb2sl1 wrote

I haven't tried doing that but if it's a similar resource requirement to prototyping (like if you'll be working with a pretrained model, not training one) then it should be fine. Again though, the biggest factor is whether you like it and if it works for you - since you bought a laptop instead of a workstation, you must have had a very good reason for needing one and none of us can answer that question for you. If you're not training, as long as your stuff fits in memory the specs don't matter that much.

1

dualmindblade t1_iva9p3g wrote

I would suggest reading the original alphago paper, it's extremely digestible, then skim the AlphaZero one, less detail there because it's a very similar architecture and actually it is simpler than the original. Think of AlphaZero as a scheme for improving the loss function, the actual architecture of the NN part is sort of unimportant, you can think of it as a black box, or maybe a black box with two smaller boxes sticking out of it.

3

computing_professor OP t1_iva7s6f wrote

Cool, thanks for the reply. With chess, I always assumed it was just examining the state as a pair (board,turn), regardless of who went first. I study the mathematics of combinatorial games and it's rare to ever consider who moves first, as it's almost always more interesting to determine the best move for any given game state.

Do you have any reading suggestions for understanding AlphaZero? I've read surface level/popular articles, but I'm a mathematician and would like to dig deeper into it. And, of course, learn how to apply it in my case.

3

dualmindblade t1_iva5ncr wrote

Disclaimer: not even close to an expert, I just keep up with the state of the field

If you were using something the AlphaZero algorithm, I'm fairly certain the asymmetry is not an issue, it would work unmodified, and also I don't think you'd want to use two models, it would weaken the play. Argument is that the NN part is trying to intuit the properties of a big tree search in which both players are participants, so it must understand both strategies about equally regardless of which side it's playing. It's no different from a human player, when you make a move the next step is to consider the resulting position from the other side and evaluate their potential moves. BTW chess is not super symmetric in practice, usually black will need to adopt a defensive strategy in the opening.

4

Zer01123 t1_iv9uq1w wrote

For what you want it to do, it is probably more than enough performance. Maybe even an overkill if you just want to play around with deep learning.

Remember that depending on what you want to do, you might need a big SSD for all the datasets, and laptops usually come with quite small ones in the default configuration.

1

macORnvidia OP t1_iv8ssle wrote

>An external GPU will just make your setup less portable without actually giving you the performance of a workstation

Can you please elaborate?

Also, as for training, I get it, I can't really train deep learning models but how about optimizing machine learning models using pyCUDA?

1

arhetorical t1_iv8nb2t wrote

The only thing that matters is if you like it. The specs really don't matter that much. Either you'll be prototyping your model, in which case you'll just be training for an epoch or two and having better specs will only save you a little bit of time, or you'll be training it in which case a laptop is not going to cut it. An external GPU will just make your setup less portable without actually giving you the performance of a workstation.

1