Nameless1995 t1_ivpnf6f wrote on November 9, 2022 at 6:17 PM

Reply to comment by geneing in [D] What does it mean for an AI to understand? (Chinese Room Argument) - MLST Video by timscarfe

> They essentially learn probabilities of different word combinations.

These isn't dichotomous with having a set of rules. The rules operate at a deeper (less interpretable level -- some may say "subsymbolic") compared to GOFAI. The whole setup of model+gradient descent correspond to having some update rules (based on partial differentations and such). In practice they aren't fully continuous either (though in theory they are) because of floating point approximations and underlying digitization.

Nameless1995 t1_ivpm198 wrote on November 9, 2022 at 6:08 PM

Reply to comment by [deleted] in [D] What does it mean for an AI to understand? (Chinese Room Argument) - MLST Video by timscarfe

I think as long as we are not purely duplicating the brain, there would always be something different (by definition of not duplicating). The question becomes then the relevancy of difference. I think there is some plausibility to the idea that some "formal" elements of the brain associated with cognition can be simulated in machines, but would that be "sufficient" for "understanding"? This question is partly hinging on semantics. We can choose to define understanding in a way such that it's fully a manner of achieving some high level formal functional capabilities (abstracting away from the details of the concrete "matter" that realizes the functions). There is a good case to be made that perhaps it's better to think of mental states in terms of higher level functional roles than "qualitative feels" (which is not to say there aren't qualitative feels, but that they need not be treated as "essential" to mental states -- the roles of which may as well be realized in analogous fashion without the same feels or any feels). If we take a similar stance that the point of having or lacking phenomenal feels (and phenomenal intentionality) becomes moot because all that would matter for understanding would be a more abstracted level of formal functionalities (which may as well be computational).

If on the other hand, we decide to treat "phenomenal feels" (and "phenomenal intentionality") as "essential" to understanding (by definition -- again a semantics issue), then I think it's right to doubt whether any arbitrary realizations of some higher level abstracted (abstracted away from phenomenal characters) behavior forms would necessarily lead to having certain phenomenal feels.

Personally, I don't think it's too meaningful to focus on "phenomenal feels' for understanding. If I say "I understand 1+1=2" and try to reflect on what it means for me to understand that, phenomenality of an experience seems to contribute very little if anything -- beyond serving potentially as a "symbol" marking my understanding (a symbol that is represented by me feeling in a certain way, an alternatively non-phenomenal "symbols" may have been used as well) -- but that "feeling" isn't true understanding because it's just a feeling. Personally, then I find the best way to characterize my understanding by grounding it in my functional capabilities to describe and talk about 1+1=2, talk about number theories, do arithmetic, --- it then boils down to possession of "skills" (which becomes a matter of degree).

It may be possible that biological materials has something "special" to constitute phenomenality infused understanding, but these are hard to make out given the problem of even determining public indicators for phenomenality.

Nameless1995 t1_ivpjf1l wrote on November 9, 2022 at 5:51 PM

Reply to comment by PrivateFrank in [D] What does it mean for an AI to understand? (Chinese Room Argument) - MLST Video by timscarfe

This is bit of a "semantics". As I also said in the other comment, it all boils down to semantics about how we decide to define "understanding" (a nebulous notion) in the first place (even intentionality is a very unclear concept -- the more I read papers on it the more confused I become because some many diverse ways people go around this concept).

If we define that understanding such that it by definition "updating" information and such (which I think is a bit weird of a definition in terms of standard usage of understanding in community. Would an omniscient being with the optimal policy setup in theory be treated as incapable of understanding?), then yes the vanilla CRA wouldn't understand but would not make any interesting claim about capabilities of programs.

Either way, Searle's point of using CRA was more for the sake of illustration to point towards something broader (need of some meaningful instantiation of programs to realize understanding properly). The point mostly stand for Searle for any formal programs (with update rules or not). In principle CRA can be modified correspondingly (think of the person following rules of state transitions --now the CRA can be allowed to have update rules from examples and feedback signals as well). But yes, then it may not be as intuitive to people if CRA would count as "not understanding" at that point. Searle was most likely trying to point towards "phenomenality". And how arbitrary instantiation of "understanding-programs" would not necessarily realize some phenomenal consciousness of understanding. But I don't think it's really necessarily to even have phenomenal consciousness for understanding (although again, that's partly a semantic disagreement about how the carve out "understanding").

Nameless1995 t1_ivpf7g8 wrote on November 9, 2022 at 5:25 PM

Reply to [D] What does it mean for an AI to understand? (Chinese Room Argument) - MLST Video by timscarfe

Friston's characterizations of qualia being knowledge about internal states is how I treat them too. But it still doesn't explain why the knowledge "feels like" something instead of the "information access" being realized by possessing simply blind dispositions to act and talk in a certain manner (including use of "qualia" language serving as an information bottleneck to simplify complex internal dynamics and allow simpler communication within intra-mind components and among inter-minds).

Ultimately I don't think there is anything mysterious. The formal functional characteristics that we find in mathematical forms are ultimately realized by physical concrete things with physical properties (whatever "physicality" may be -- which may turn out to be idealistic in essence). One of the properties of certain physical configurations may be qualitativity. Qualitativity need not be "essential" for realization of cognitve forms (or access to internal state information), but simply one "way" the realization happens in biological (and possibly non-biological) entities. If similar forms of realization can happen in sillicon is then upto a theory of consciousness that account for which physical configurations and what exactly leads to qualitative dispositions as opposed to non-feel dispositions.

I also don't think Bishop's answers to need of phenomenality answers anything. Why should uncertainty reduction require phenomenality per se? You can just well have a higher-order decision mechanism that experiments, makes discrete actions in the world, intervenes and so on to reduce search space without any appeal to phenomenality. I think it's possible that how our "phenomenal consciousness" works contingently at a certain level, but it doesn't mean there could not have been alternatives without phenomenality to realize some uncertainty-reduction functionalism (especially if we can make a mathematical model of this process).

Nameless1995 t1_ivpc688 wrote on November 9, 2022 at 5:05 PM

Reply to comment by PrivateFrank in [D] What does it mean for an AI to understand? (Chinese Room Argument) - MLST Video by timscarfe

Right. The Book in CRA is meant to represent the true rules (or at least something "good enough" for a human with billingual capabilities) at a given time so the "need" for updating rules from feedback is removed (feedback is needed in practical settings, because we are not in a thought-experiment which stipulates some oracle access). The point is that the practical need of contemporary ML models for refination (given lack of magical access to data generating processes), doesn't entail the 'in principle' impossibility to write down serviceable rules of translation for a specific time instance in a book.

Nameless1995 t1_ivp9j9b wrote on November 9, 2022 at 4:48 PM

Reply to comment by PrivateFrank in [D] What does it mean for an AI to understand? (Chinese Room Argument) - MLST Video by timscarfe

> Yeah, but at the same time the translation "logic" is being continuously refined through learning.

Yes by following more rules (rules of updating other rules).

> The book of rules is static in the old example.

True. The thought experiments make some idealization assumptions. Current programs need to "update" partly because they don't have access to the "true rules" from the beginning (that's why the older models didn't work as well either). But in CRA, the book, represents the true rules all laid bare. One issue is that in real life, the rules of translations itself is dynamic and can change with change in languages. To address that CRA can focus on a specific instance of time (time t) and consider it as given that the true rules are available for time t, and consider the question about knowledge of Chinese at time t. (But yes, there may not be even true "determinate" rules -- because of statistical variations of how individuals use language. Instead there can be a distribution of rules each aligning with real life usages to varying degrees of fit. The book can be then treated as a set of coherent rules that belongs to a dense area in that distribution at time t.)

Nameless1995 t1_ivp5o2x wrote on November 9, 2022 at 4:23 PM

Reply to comment by geneing in [D] What does it mean for an AI to understand? (Chinese Room Argument) - MLST Video by timscarfe

> He never defines what "understand" means. Without a clear definition, he can play rhetorical tricks to support his argument.

Right.

> Is it really possible to translate from English to Chinese by just following a book of rules? Have you seen "old" machine translations that were basically following rules - it was trivial to tell machine translation from human translation.

Newer one's are still following rules. It's still logic gates and bit manipulation underneath.

Nameless1995 t1_ivp554d wrote on November 9, 2022 at 4:19 PM

Reply to comment by trutheality in [D] What does it mean for an AI to understand? (Chinese Room Argument) - MLST Video by timscarfe

> The fallacy in the Chinese room argument in essence is that it incorrectly assumes that the rule-following machinery must be capable of understanding in order for the whole system to be capable of understanding.

This is "addressed" (not necessarily successfully) in the original paper:

https://web-archive.southampton.ac.uk/cogprints.org/7150/1/10.1.1.83.5248.pdf

> I. The systems reply (Berkeley). "While it is true that the individual person who is locked in the room does not understand the story, the fact is that he is merely part of a whole system, and the system does understand the story. The person has a large ledger in front of him in which are written the rules, he has a lot of scratch paper and pencils for doing calculations, he has 'data banks' of sets of Chinese symbols. Now, understanding is not being ascribed to the mere individual; rather it is being ascribed to this whole system of which he is a part."

> My response to the systems theory is quite simple: let the individual internalize all of these elements of the system. He memorizes the rules in the ledger and the data banks of Chinese symbols, and he does all the calculations in his head. The individual then incorporates the entire system. There isn't anything at all to the system that he does not encompass. We can even get rid of the room and suppose he works outdoors. All the same, he understands nothing of the Chinese, and a fortiori neither does the system, because there isn't anything in the system that isn't in him. If he doesn't understand, then there is no way the system could understand because the system is just a part of him.

> Actually I feel somewhat embarrassed to give even this answer to the systems theory because the theory seems to me so implausible to start with. The idea is that while a person doesn't understand Chinese, somehow the conjunction of that person and bits of paper might understand Chinese. It is not easy for me to imagine how someone who was not in the grip of an ideology would find the idea at all plausible. Still, I think many people who are committed to the ideology of strong AI will in the end be inclined to say something very much like this; so let us pursue it a bit further. According to one version of this view, while the man in the internalized systems example doesn't understand Chinese in the sense that a native Chinese speaker does (because, for example, he doesn't know that the story refers to restaurants and hamburgers, etc.), still "the man as a formal symbol manipulation system" really does understand Chinese. The subsystem of the man that is the formal symbol manipulation system for Chinese should not be confused with the subsystem for English.

And also in Dneprov's game:

http://q-bits.org/images/Dneprov.pdf

> “I object!” yelled our “cybernetics nut,” Anton Golovin. “During the game we acted like individual switches or neurons. And nobody ever said that every single neuron has its own thoughts. A thought is the joint product of numerous neurons!”

> “Okay,” the Professor agreed. “Then we have to assume that during the game the air was stuffed with some ‘machine superthoughts’ unknown to and inconceivable by the machine’s thinking elements! Something like Hegel’s noˆus, right?”

I think the biggest problem with CRA and even Dneprov's game is that it's not clear what the "positive conception" (Searle probably elaborates in some other books or papers) of understanding should be. They are just quick to quip "well, that doesn't seem like understanding, that doesn't seem to possess intentionality and so on so forth" but doesn't elaborate what they think exactly possessing understanding and intentionality is like so that we can evaluate if that's missing.

Even the notion of intentionality do not have clear metaphysical grounding, and there are ways to take (and already taken) intentionally in functionalist framework as well (in a manner such that machines can achieve intentionality). So it's not clear what exactly is it that we are supposed in find in "understanding" but missing in chinese rooms. The bias is perhaps clearer in the professors rash dismissal of the objection that the whole may understanding by suggesting that some "machine superthoughts" would be needed. If understanding is nothing over and beyond a manner of functional co-ordination, then there no need to think that the systems-reply suggestion requires "machine superthoughts". My guess is that Searle and others have an intuition that understanding requires some "special" qualitative experience or cognitive phenomenology. Except I don't think they are really as special, but merely features of stuff that realize the forms of cognition in biological beings. The forms of cognition may as well be realized differently without "phenomenology". As such, the argument can boil down partially to "semantics" whether someone is willing to broaden the notion of understanding to remove appeals to phenomenology or any other nebulous "special motion of living matter".

> We know that humans understand things. We also know that at a much lower level, a human is a system of chemical reactions. Chemical reactions are the rule-following machinery: they are strictly governed by mathematics. The chemical reactions don't understand things, but humans do.

Note that Searle isn't arguing that rule-following machinery or machines cannot understand. Just that there is no "understanding program" per se that can realize understanding no matter how it is realized or simulated. This can still remain roughly true based on how we define "understanding".

This is clarified in this QA form in the paper:

> "Could a machine think?" The answer is, obviously, yes. We are precisely such machines.

> "Yes, but could an artifact, a man-made machine think?" Assuming it is possible to produce artificially a machine with a nervous system, neurons with axons and dendrites, and all the rest of it, sufficiently like ours, again the answer to the question seems to be obviously, yes. If you can exactly duplicate the causes, you could duplicate the effects. And indeed it might be possible to produce consciousness, intentionality, and all the rest of it using some other sorts of chemical principles than those that human beings use. It is, as I said, an empirical question. "OK, but could a digital computer think?" If by "digital computer" we mean anything at all that has a level of description where it can correctly be described as the instantiation of a computer program, then again the answer is, of course, yes, since we are the instantiations of any number of computer programs, and we can think.

> "But could something think, understand, and so on solely in virtue of being a computer with the right sort of program? Could instantiating a program, the right program of course, by itself be a sufficient condition of understanding?" This I think is the right question to ask, though it is usually confused with one or more of the earlier questions, and the answer to it is no.

> "Why not?" Because the formal symbol manipulations by themselves don't have any intentionality; they are quite meaningless; they aren't even symbol manipulations, since the symbols don't symbolize anything. In the linguistic jargon, they have only a syntax but no semantics. Such intentionality as computers appear to have is solely in the minds of those who program them and those who use them, those who send in the input and those who interpret the output.

The point Searle is trying to make is that understanding is not exhausively constituitive of some formal relations but it also depends on how the formal relations are physically realized (what sort of relevant concrete causal mechanisms are underlying them and so on).

Although for anyone who says this, there should be a burden for them to explain exactly what classes of instantiation is necessary for understanding, and what's so special about those classes of instantiation of relevant formal relations that's missed in other Chinesen-room like simulations. Otherwise, it's all a bit vague and wishy washy appeals to intuition.

Nameless1995 t1_ivi33nf wrote on November 8, 2022 at 3:01 AM

Reply to comment by smallest_meta_review in [R] Reincarnating Reinforcement Learning (NeurIPS 2022) - Google Brain by smallest_meta_review

I am not sure. It's not my area of research. I learned of some of these ideas in a presentation made by someone years ago. Some of these recent paper essentially draws connection between distillation and label smoothing (essentially a way to provide "soft" labels -- this probably connects up with mixup techniques too). So on that ground, you can justify using any kind of teacher/student I think. Based on the label smoothing connection some paper goes for "teacher-free" distillation. And some others seem to be introducing "lightweight" teacher instead (I am not sure if the lightweight teacher is lower capacity than the student which would make it what you were looking for -- students having higher capacities. I haven't really read it beyond the abstract - just found it a few minutes ago from googling): https://arxiv.org/pdf/2005.09163.pdf (doesn't seem like a very popular paper though given it was published in arxiv in 2020 and have only 1 citation). Looks like a similar idea as to self-distillation was also available under the moniker of "born-again networks" (similar to also the reincarnation monker): https://arxiv.org/abs/1805.04770

Nameless1995 t1_ivhscyv wrote on November 8, 2022 at 1:42 AM

Reply to comment by smallest_meta_review in [R] Reincarnating Reinforcement Learning (NeurIPS 2022) - Google Brain by smallest_meta_review

> (rather than decrease capacity akin to SL)

Distillation in supervised literature doesn't always reduce capacity for the student. I believe iterative distillation and such have been also explored where students have the same capacity but it leads to better calibration or something I forgot. (https://arxiv.org/abs/2206.08491, https://proceedings.neurips.cc/paper/2020/hash/1731592aca5fb4d789c4119c65c10b4b-Abstract.html)

Nameless1995 t1_ive35bn wrote on November 7, 2022 at 7:58 AM

Reply to comment by Odd-Squirrel4324 in [D] ICLR 2023 reviews are out. How was your experience ? by dasayan05

Probably. You can you use the unlimited appendix though.

Nameless1995 t1_itp6gxm wrote on October 25, 2022 at 8:29 AM

Reply to comment by Intelligent-Dish-293 in [D] AAAI 2023 Reviews by CauseRevolutionary59

Not too bad.

Nameless1995 t1_itft97w wrote on October 23, 2022 at 9:38 AM

Reply to comment by Roger_Perfect in [D] AAAI 2023 Reviews by CauseRevolutionary59

I don't know. Probably would depend on your paper genre, what kind of scores other papers are getting and so on, what kind of cut-off they decide upon etc. I looked back at my previous submissions. We had a paper from AAAI2022 with the same scores as yours and it was rejected.

Nameless1995 t1_itevff4 wrote on October 23, 2022 at 2:53 AM

Reply to comment by loooompen in [D] AAAI 2023 Reviews by CauseRevolutionary59

>This year we received a record 9,251 submissions, of which 9,020 were reviewed. Based on a thorough and rigorous review process we have accepted 1,349 papers. This yields an overall acceptance rate of 15%.

(from last year)

Nameless1995 t1_itejopj wrote on October 23, 2022 at 1:15 AM

Reply to comment by Roger_Perfect in [D] AAAI 2023 Reviews by CauseRevolutionary59

There are some chances.

Nameless1995 t1_itej85o wrote on October 23, 2022 at 1:12 AM

Reply to comment by zodiacg in [D] AAAI 2023 Reviews by CauseRevolutionary59

> Award quality: Technically flawless paper with groundbreaking impact on one or more areas of AI, with exceptionally strong evaluation, reproducibility, and resources, and no unaddressed ethical considerations. Top 2% of accepted papers.

> Very Strong Accept: Technically flawless paper with groundbreaking impact on at least one area of AI or excellent impact on multiple areas of AI, with flawless quality, reproducibility, resources, and no unaddressed ethical considerations. Top 15% of accepted papers.

> Strong Accept: Technically strong paper with, with novel ideas, high impact on at least one area of AI, with excellent quality, reproducibility, resources, and no unaddressed ethical considerations. Top 30% of accepted papers.

> Accept: Technically solid paper, with high impact on at least one sub-area of AI or modest-to-high impact on more than one area of AI, with good to excellent quality, reproducibility, and if applicable, resources, and no unaddressed ethical considerations. Top 60% of accepted papers.

> Weak Accept: Technically solid, modest-to-high impact paper, with no major concerns with respect to quality, reproducibility, and if applicable, resources, ethical considerations.

> Borderline accept: Technically solid paper where reasons to accept, e.g., good novelty, outweigh reasons to reject, e.g., fair quality. Please use sparingly.

> Borderline reject: Technically solid paper where reasons to reject, e.g., poor novelty, outweigh reasons to accept, e.g. good quality. Please use sparingly.

> Reject: For instance, a paper with poor quality, inadequate reproducibility, incompletely addressed ethical considerations.

> Strong Reject: For instance, a paper with poor quality, limited impact, poor reproducibility, mostly unaddressed ethical considerations.

> Very Strong Reject: For instance, a paper with trivial results, limited novelty, poor impact, or unaddressed ethical considerations.

Nameless1995 t1_irkutz6 wrote on October 9, 2022 at 12:59 AM

Reply to [D] Why can't language models, like GPT-3, continuously learn once trained? by SejaGentil

> GPT-3 has a prompt limit of about ~2048 "tokens", which corresponds to about 4 characters in text.

What do you mean 2048 tokens corresponding to 4 characters? The tokens are at subwords levels, they are much bigger than 4 characters.

> this limitation comes from amount of the input neurons

The 2048 limit comes from using trianable positional embeddings. Otherwise there is a lookup table for each possible token to map into a corresponding embeddings, and then the same input neurons are used for each embeddings. Without trainable position embeddings (for example if something like relative embeddings are used, or trigonometric functions like in original transformers), there is no official "prompt limit" (and some of the GPT-3-like models probably don't). The only limit to how big the input can be in those case is your gpu (or cpu depending on what you are using) memory.

But having arbitrarily long input is different from continuous training. With simply continuous training you would be changing the values of existing weights only (and you can already do that: but you will run into issues described by /u/suflaj). However, with arbitrarily long input prompt, you would be adding more and more tokens to attend -- increasing computational complexity massively. Even without an official prompt limit, you will most likely run into practical limit at around 2K tokens anyway. One way to resolve that would be to use the model semi-recurrently with few hidden states compressing an arbitrarily long past, so that attention remains bounded in complexity. But that would also mean a lot of past information would be lost since there is a limit to how much you can compress into some bounded budget. Although you can probably extend this paradigm by building a massive dictionary of memories while compressing past input, and then using sparse & fast top-k retrieval for bound computation. Someone would probably make something like that someday.

Nameless1995 t1_iqus5e6 wrote on October 3, 2022 at 6:39 AM

Reply to [D] - Why do Attention layers work so well? Don't weights in DNNs already tell the network how much weight/attention to give to a specific input? (High weight = lots of attention, low weight = little attention) by 029187

DNN weights are static (same for all inputs). Attention weights are dynamic (input-dependent). In this sense, attention weights are sorts of "fast weights".