Nameless1995 t1_izj36ff wrote on December 9, 2022 at 2:07 PM

Reply to comment by Accomplished-Bill-45 in [D] Simple Questions Thread by AutoModerator

I think in principle if you have enough resource, and investigate the right fine tuning techniques you can get the SOTA out of them. However, at the moment it's quite new, moreover not as open access for research. Furthermore, RL training is not easy to do for a random researcher (because it's kind of in a human in the loop framework and you require human annotations --- you can kind of probably do it with AWS and such though; but it probably won't easily become a standard too soon because of the inconvenience).

Another thing is ELLM (let's say "extra large language models" to distinguish GPT3+ style of models from BERT/BART/Roberta/GPT2 style of models) are generally used in few-shot setup/instruction following setup, and probably won't fit exactly with "fine-tuning on whole dataset" setup. And it can be again hard for random researchers to fine-tune or even run those humongous models. So it may again take time to time to seep into everywhere.

In my investigation ChatGPT seems to still struggle a bit on some harder logical challenges (some of which even I struggled a bit with) eg. in LogiQA: https://docs.google.com/document/d/1PATTi0hmalBvY_YQFr4gQrjDqfnEUm8ZDkG20J6U1aQ/edit?usp=sharing

(although you can probably improve upon by more specialized RL training for logical reasoning + multiple reasoning path generation + self-consistency checking + least-to-most prompting etc.)

I think SOTA of logiQA is: https://aclanthology.org/2022.findings-acl.276/ (you can find relevant papers by looking at the citation network in semantic scholar)

For reasoning on other areas, you can probably use the chain of thought papers and its related citations to keep track (because COT is almost a landmark in prompt engineering for enhanced reasoning, and most future ELLM paper working on reasoning would probably cite it).

Don't know much about common-sense reasoning (either as a human or in terms of research in that area).

Nameless1995 t1_iziy3gn wrote on December 9, 2022 at 1:27 PM

Reply to comment by Readityesterday2 in [D] Simple Questions Thread by AutoModerator

What do you mean by "desktop alternatives"? You mean something you can train on a single GPU or two? I don't think you would get any real alternative for that unless you lower your expectations by a lot. But for more open source stuff you can check https://www.eleuther.ai/ GPT-style models and others like Blenderbot, bloom etc. For image generation, probably stable diffusion or something.

Nameless1995 t1_izit9gd wrote on December 9, 2022 at 12:42 PM

Reply to [D] Did I overfit to val by choosing the best checkpoint? by CrazyCrab

Perhaps you can try cross-validation.

Nameless1995 t1_izgkbq7 wrote on December 8, 2022 at 11:11 PM

Reply to comment by Cyalas in Personal project for PhDs and scientists [P] by Cyalas

I don't think it was sorted IIRC but may be I missed something. Also is there a way to sort in both ascending/descending directions?

Nameless1995 t1_izgd34j wrote on December 8, 2022 at 10:20 PM

Reply to comment by Cyalas in Personal project for PhDs and scientists [P] by Cyalas

I think I was using it in the wrong way by giving specific titles. Using keywords provides a more interesting results. It looks nice but not sure where I would stand with it. For example I tried "attention" (perhaps too broad of a search term), and only a few papers (even Transformers is missing). Should there have been a paging mechanism? Also sorting doesn't seem to be working either (sorting by date didn't change anything).

Nameless1995 t1_izg72td wrote on December 8, 2022 at 9:39 PM

Reply to Personal project for PhDs and scientists [P] by Cyalas

Seems like need some work. I searched some titles but didn't get any results (one was from 2019 arxiv). I found a result on https://arxiv.org/abs/1707.02786 but the result only shows the author name with et al. without the title. The review seemed pretty basic -- just a summarization of abstract (I am also a bit confused by your descriptions: are you attempting to implement a literature review where other related papers are suggested and described or a review in the sense of what reviewers in a conference provide?). And even structured abstract didn't structured much at least for this paper. I don't know may be I got unlucky with the specific papers I tried.

Edit: okay I see what you are doing with "review". You are generating "citation sentences" basically. I am not sure how useful it is as a feature because that requies minimal effort to do it in practice though. But some may find it useful.

Nameless1995 t1_iz92pe9 wrote on December 7, 2022 at 11:16 AM

Reply to comment by huberloss in [D] If you had to pick 10-20 significant papers that summarize the research trajectory of AI from the past 100 years what would they be by versaceblues

> "Grammar Induction and Parsing with a Recursive Neural Network" by Stephen Clark and James R. Curran (2007) - This paper introduced the use of recursive neural networks for natural language processing tasks.

Is this one hallucinated? Couldn't find it.

Some other seems hallucinated too, although semantically related to kind of things the authors do.

Nameless1995 t1_iz2j0ja wrote on December 6, 2022 at 12:03 AM

Reply to comment by ktpr in [R] The Forward-Forward Algorithm: Some Preliminary Investigations [Geoffrey Hinton] by shitboots

Incidentally, Hinton has a lot of professional experience in psychology/cognitive science: https://www.cs.toronto.edu/~hinton/fullcv.pdf

> Jan 76 - Sept 78 Research Fellow Cognitive Studies Program, Sussex University, England Oct 78 - Sept 80 Visiting Scholar Program in Cognitive Science, University of California, San Diego Oct 80 - Sept 82 Scientific Officer MRC Applied Psychology Unit, Cambridge, England Jan 82 - June 82 Visiting Assistant Professor Psychology Department, University of California, San Diego

Nameless1995 t1_iyz4iie wrote on December 5, 2022 at 6:44 AM

Reply to comment by ReadSeparate in [D] OpenAI’s ChatGPT is unbelievable good in telling stories! by Far_Pineapple770

Yes, this is partly done in the semi-recurrent Transformers. The model has to decide which information it needs to store in the compressed recurrent chunk-wise memory for future.

What you have in mind is probably closer to a form of "long term memory" while, arguably, what the semi-recurrent transformer may be modelling is better short-term memory (although S4 itself can model strong long-term dependencies but not sure how that would translate to more complex real world) i.e by recurrently updating some k vectors (which can signify a short-term memory, or a working memory). While in theory the short-term memory, as implemented in semi-recurrent transformers still may give access to information from far far back in the past and the "short-term" may be a misnomer (perhaps, "working" memory is the better term), the limitation would be that it's bandwith is still low (analogous to our own working memory) - all past beyond the chunk window needs to compressed into some k vectors. This may suffice for practical use like conversation for a few hours, but perhaps not good enough for "life-time agents", building up its own profile through a life time of memory (I would be skeptical that our "slow" memory of salient things we have experienced throughout life can be compressed into a few vectors of a recurrent memory).

However, aspects of the solution for that problem is also here. For example, the memorizing transformer paper (https://arxiv.org/abs/2203.08913) that I mentioned already allows k-NN retriever from it's whole past representations (which can be a lifetime of conversational history without compression. Basically in this case "every thing is stored" but only relevant things are retrieved as needed by a kNN (so the burden of learning what to "store" is removed and the main burden is in the retrieval -- finding top-k relevant items from memory). However, if we need to "bound" total memory we can use some adaptive deletion mechanism as well based on for example surprisal mechanism ("more surprising information" -- quantifiable based on difficulty to predict (can be easily done with NNs) can be made more persistent in the memory -- i.e more resistant to deletion)). This is similar to Retrieval augmented generation, where the model retrieves information from external sources like wikipedia and such, but instead the same kind of technique is used towards the model's own past information. The combination of this kNN retrieval with a more local "working memory" (from semi-recurrent transformer papers) could be potentially much more powerful. I think overall most of the elementary tools to making some uber-powerful model (leaps beyond GPT) are already here, the challenge is in engineering, making a scalable solution given limited computation and developing elegant integration (but with rise in brute computational power, challenges will only grow weaker even if we don't come up with many new concepts in the modeling side).

Nameless1995 t1_iyyu1lg wrote on December 5, 2022 at 4:48 AM

Reply to comment by SkeeringReal in [D] Score 4.5 GNN paper from Muhan Zhang at Peking University was amazingly accepted by NeurIPS 2022 by Even_Stay3387

> if it's a bad paper who cares if it's. published or not, no one will read it regardless.

How do we know if it's a bad paper without reading it.

Nameless1995 t1_iyytwtb wrote on December 5, 2022 at 4:47 AM

Reply to comment by Even_Stay3387 in [D] Score 4.5 GNN paper from Muhan Zhang at Peking University was amazingly accepted by NeurIPS 2022 by Even_Stay3387

I checked the review engagements. Reviewers 1 and 2 are willing to give borderline accept/weak accept. Even for Reviewer 1, the authors had the final say and reviewer 1 didn't respond further.

Reviewer 3 and 4 are giving weak reject/borderline reject.

Reviewer 3 was only ultimately hung up on the paper not providing a formal proof for some aspects (seemed to have implicitly accepted that other concerns are addressed). In the end the authors claim that they provide the formal proof, but reviewer 3 didn't respond further. Reviewer 4 didn't engage at all.

So I don't think it's "ridiculous" to say that the reviews are outdated. And ideally, we don't want the meta-reviewer to just average scores (otherwise there is no point for a meta-reviewer, just use a calculator, and then accept papers based on scores - would simplify the whole pipeline if we really want that).

Nameless1995 t1_iyysho5 wrote on December 5, 2022 at 4:34 AM

Reply to comment by crouching_dragon_420 in [D] Score 4.5 GNN paper from Muhan Zhang at Peking University was amazingly accepted by NeurIPS 2022 by Even_Stay3387

> who can be PhD students that got assigned papers to review by their supervisors

Or the conference. PhD students can also directly get review requests and assignment from the conference. I review stuff as a PhD student that my supervisor don't know about.

Nameless1995 t1_iyyl3m5 wrote on December 5, 2022 at 3:30 AM

Reply to comment by ReadSeparate in [D] OpenAI’s ChatGPT is unbelievable good in telling stories! by Far_Pineapple770

Technically Lambda already uses "external database" i.e external tools (the internet, calculator, etc.) to retrieve information:

https://arxiv.org/pdf/2201.08239.pdf (Section 6.2)

It doesn't solve /u/ThePahtomPhoton's memory problem (I don't remember what GPT3's exact solution is), but solutions already exist (just not scaled up to GPT3 level).

One solution is using a kNN lookup in a non-differentiable manner: https://arxiv.org/abs/2203.08913

One solution is making Transformers semi-recurrent (process inside chunks parallely, then sequencially process some coarse-compressed-chunk-representation sequentially.). This can allow information to be carried in through the sequential process:

https://arxiv.org/pdf/2203.07852

https://openreview.net/forum?id=mq-8p5pUnEX

Another solution is to augment Transformer with a State Space model which have shown great promise in long range arena:

https://arxiv.org/abs/2206.13947

https://arxiv.org/pdf/2206.12037

https://arxiv.org/abs/2209.10655

Nameless1995 t1_iy81y83 wrote on November 29, 2022 at 1:17 PM

Reply to [P] Speaking with Plato - A Deep Learning Approach to Philosophy by Ingvariuss

> A thing which is taught by a certain master, and which is rightly taught by him; and he who taught it, and has taught it also, is good in so far as it is taught?

Kind of ironic given that according to the texts this is precisely what Socrates (or Plato through his character of Socrates) deny to be possible (to be able to teach virtue).

Nameless1995 t1_ix6hzd0 wrote on November 21, 2022 at 2:50 AM

Reply to comment by daking999 in [D] Why do we train language models with next word prediction instead of some kind of reinforcement learning-like setup? by blazejd

Yeah, that would be an interesting scalable way to get human feedback. Perhaps, someone is already doing it.

Nameless1995 t1_ix6gugr wrote on November 21, 2022 at 2:40 AM

Reply to comment by brates09 in [D] Why do we train language models with next word prediction instead of some kind of reinforcement learning-like setup? by blazejd

The jury is probably not out yet on that. Inititally, IIRC, BERT was posited to be better than GPT-style training for bidirectionality in modeling, and it was empirically shown too. But GPT-style model won out by scaling up much more. There may be some truth to what you said, because GPT gets much more training signal per iteration it may ultimately give us better result after scaled up, but I am not entirely sure why BERT-style model was not as scaled up (did people never try out of a priori hesitancy or they try didn't get good results and didn't report). Another issue is the rise of prompting - which is much more in tune with an autoregressive unidirectional training and it just falls out naturally much more easily from GPT-style training.

However, T5 style training is closer to BERT (T5 has a bidirectional encoder and a causal decoder, but the decoder only predicts some masked spans) to an extent. Recently this paper shows that you can get on par performance to fully causal decoder by using a scaled up T5 style model through a trick: https://arxiv.org/abs/2209.14500...again this may not get much practical purchase given how expensive SAP (the trick in the paper) can be but the question is open -- and perhaps there is a better middle way somewhere.

Nameless1995 t1_ix4vylx wrote on November 20, 2022 at 7:47 PM

Reply to [D] Why do we train language models with next word prediction instead of some kind of reinforcement learning-like setup? by blazejd

tl;dr: what you are suggesting is much much harder to do, than just letting a LM go brrr in a large-scale internet text; and moreover, you are probably overestimating the benefits that your suggestion may provide. Research goes in the direction what is easier to set up ATM.

For passive training you can just take the language model and feed it gigabytes of internet data.

Much harder to do the same with a more interactive settings where some expert provide real time online feedback based on what the language model is doing. Where do you find some experts? Do you get humans in the loop? You may but you can't hope to compensate for the scale of data you can train passively with a bunch of humans set in a loop even if for years.

It's also very likely language models wouldn't learn that good (at least if current Transformers are started from a blank slate random initializations) from a simple "human in a loop" setting even if the body nurtured like a baby for a years. First, it would be lacking many other forms of multimodal interactive signals that a human gains in similar settings. Implementing a full miltimodally grounded model efficiently is not trivial if not impossible (much less trivial than making a simple language model do its thing - and it's an area that will probably require more research -- although there is progressm like PaLM-saycan, GATO). Second, "humans" may possess better inductive biases from the get go due to potentially evolutionary reasons making them more sample efficient than just randomly initialized language models.

Both of those limitations may be partially counteracted by large-scale internet text training (which also would be much faster than training the model like a human baby -- limited by the slowness of human trainers)

Moreover, it's not really an either-or. You can both train a model passively and start with that to "initialize the model", and then fine-tune it in human-in-the-loop style settings or using RL: https://arxiv.org/abs/2203.02155

Moreover, you can think of the "passive learning" as a form of interaction with the environment too. The model predicts the future state of the environment (future word), the environment returns the "true word" (although independent of model's interaction). A "reward" is calculated (based on the agent's action i.e prediction) from cross entropy loss of the probability distribution of model's prediction and the "true word" that the environment provides -- except in this case there isn't a live environment, but pre-recorded offline demonstrations from past environmental dynamics (human communications).

Nameless1995 t1_ix0kq7t wrote on November 19, 2022 at 8:31 PM

Reply to comment by Squark09 in Utilitarianism is the only option — but you have to take conscious experience seriously first by Squark09

https://pastebin.com/XgtTFgam

Nameless1995 t1_iwvxd08 wrote on November 18, 2022 at 7:26 PM

Reply to comment by ninjasaid13 in Why Scientific Progress in Ethics Is Frozen by DirtyOldPanties

https://www.jotform.com/blog/40-terrific-works-of-art-made-from-common-trash/

Nameless1995 t1_iwvghxr wrote on November 18, 2022 at 5:31 PM

Reply to Utilitarianism is the only option — but you have to take conscious experience seriously first by Squark09

Even if we assume that conscious valence provides an ought for the conscious being with the valence, it's not clear how you can universalize valence maximization separating it from particular individuals. Having an ought from my consciousness valence doesn't immediately imply that I am obligated to, say, sacrifice my consciousness valence for "overall maximization of some utility function accomodating valence of all beings" or such if one is not already poised towards selflessness.
Open Individualism may help with the above concern, but it's more controversial and niche than utilitarianism itself. Kind of undermines the whole project when to support X, you have rely on something even more controversial than X. Either way, I also don't see why I should care for some "metaphysical unity/lack of separation" -- which can be up to how we use language. I don't see why boundedness of consciousnesses (restriction from accessing other's consciousness unmediated) isn't enough to ground for individuation and separation irrespective of whether all the separate perspectives are united in a higher dimensional spatial manifold, God, or Neoplatonic One or what have you. It's unclear to me if such abstract metaphysical unities really matter. We don't individuate things based on them being completely causally isolated and separated from the world.
I don't see why a proper normative theory shouldn't be applicable and scalable to hypothetical and idealized scenarios. Their lack of robustness to hypotheticals should count as a "bug" and proper reason should be given as to why it doesn't apply there. Real life scenarios are complex, before just blindly applying a theory we need some assurance. Idealized scenarios and hypotheticals allows us to "stress test" our theories to gain assurance. Ignoring them because "they are not realistic" doesn't sound very pleasant.
I don't see how logarithmic scaling helps with repugnant conclusion. Repugnant conclusion comes from the idea that x beings with utility y but high quality utility for each of x beings can be always overtaken by a seemingly less ideal scenario where m beings (m >>> x) exists with utility z (z > y) but each of m being has low quality utility. I don't see what changes if the individuals happiness grows logarithmically (you can always adjust the numbers to make it a problem), and I don't see what changes if there is a underlying unitary consciousness behind it all. Is the "same" consciousness having multiple low quality experiences really better than that having multiple high quality experiences but in less number?
I also don't see the meaning of calling it the "same" consciousness if it doesn't have a single unified experience (solipsistic).

Nameless1995 t1_iw05d36 wrote on November 11, 2022 at 10:26 PM

Reply to [D] Regularization & augmentation for NLP finetuning by ichiichisan

There isn't an established standard AFAIK.

EDA is a simple baseline for augmentation: https://arxiv.org/abs/1901.11196

(see citations in google scholar for recent ones).

(Recent ones are playing around with counterfactural augmentation and such but not sure if any standard stable technology has arisen.)

This one had nice low resource performance: https://arxiv.org/pdf/2106.05469.pdf

Also this: https://aclanthology.org/2021.emnlp-main.749.pdf (you can find some new stuff from citations in google scholar/semantic scholar).

I think Prompt Tuning, Contrastive Learning (https://openreview.net/pdf?id=cu7IUiOhujH) did show better very low resource performance too, but, the benefit tapers out as you increase data.

If you are seeking for Adversarial robustness there are also other techniques for that. I think FreeLB was popular a while ago. There's also SAM for flatter minima.

Nameless1995 t1_ivxosmw wrote on November 11, 2022 at 11:35 AM

Reply to comment by red75prime in [D] What does it mean for an AI to understand? (Chinese Room Argument) - MLST Video by timscarfe

I don't think Searle denies that so I don't know who you are referring to.

Here's quote from Searle:

> "Could a machine think?"

> The answer is, obviously, yes. We are precisely such machines

> Yes, but could an artifact, a man-made machine think?"

> Assuming it is possible to produce artificially a machine with a nervous system, neurons with axons and dendrites, and all the rest of it, sufficiently like ours, again the answer to the question seems to be obviously, yes. If you can exactly duplicate the causes, you could duplicate the effects. And indeed it might be possible to produce consciousness, intentionality, and all the rest of it using some other sorts of chemical principles than those that human beings use. It is, as I said, an empirical question. "OK, but could a digital computer think?" If by "digital computer" we mean anything at all that has a level of description where it can correctly be described as the instantiation of a computer program, then again the answer is, of course, yes, since we are the instantiations of any number of computer programs, and we can think.

Nameless1995 t1_ivq9tkh wrote on November 9, 2022 at 8:42 PM

Reply to comment by trutheality in [D] What does it mean for an AI to understand? (Chinese Room Argument) - MLST Video by timscarfe

> If understanding arises in a program, it is not going to happen at the level of abstraction to which Searle is repeatedly returning.

There is a bit nuance here.

What Searle is trying to say by "programs don't understand" is not that there cannot be physical instantiations of "rule following" programs that understands (Searle allows that our brains are precisely one such physical instantiation), but that there would be some awkward realizations of the same program that don't understand. So the point is actually relevant at a more higher level of abstraction.

> Ultimately in these "AI can't do X" arguments there is a consistent failure to apply the same standards to both machines and humans, and, as you point out, a failure to provide falsifiable definitions for the "uniquely human" qualities being tested, be it understanding, qualia, originality, or what have you.

Right. The point of Searle becomes even more confusing because on one hand he is explicitly allowing "rule following machines" can understand (he explicitly says that instances of appropriate rule-following programs may understand things and also that we are machines that understand), at the same time he doesn't think mere simulation of functions of a program with any arbitrary implementation of rule-following is not enough. But then it becomes hard to tease out what exactly "intentionality" is for Searle, and why certain instances of rule-following through certain causal powers can have it, while the same rules simulated otherwise in the same world correspond to not having "intentionality".

Personally, I think he was sort of thinking in terms of hard problem (before the hard problem was made: well it existed in different forms). He was possibly conflating understanding with having phenomenal "what it is like" consciousness of certain kind.

> consistent failure to apply the same standards to both machines and humans

Yeah, I notice that. While there are possibly a lot of things we don't completely understand about ourselves, there also seems to be a tendency to overinflate ourself. As for myself, if I reflect first-personally I have no clue what is it I exactly do when I "understand". There were times, I have thought when I was younger, that I don't "really" "understand" anything. Whatever happens, happens on it own, I can't even specify the exact rules of how I recognize faces, how I process concepts, or even are "concepts" in the first place, or anything. Almost everything involved in "understand anything" is beyond my exact conscious access.

Nameless1995 t1_ivpyckd wrote on November 9, 2022 at 7:28 PM

Reply to comment by Nameless1995 in [D] What does it mean for an AI to understand? (Chinese Room Argument) - MLST Video by timscarfe

(5) Given that I cannot really get a positive conception of understanding beyond possessing relevant skills, I don't have a problem in broadening the notion of understanding to abstract out phenomenality (which may play a contingent (not necessary) role in realizing understanding through "phenomenal powers"). So on that note, I have no problem with allowing understanding (or at least a little bit of it) to a "system of paper + dumb-rule-follower + rules as a whole" producing empty symbols in response to other empty symbols in a systematic manner such that the it is possible to map the input and output in a manner making it possible to interpret it as a function to do arithmatic. You may say that these symbols are "meaningless" and empty. However, I don't think "meaning" even exists beyond just functional interactions. "meaning" as we use the term simply serves as another symbol to simplify communication of complicated phenomena wherein the communication itself is just causal back-and-forth. Even qualia to me are just "empty symbols" gaining meaning not intrinsically but from their functional properties in grounding reactions to changes in world states.

(6) Note I said "little bit of it" regarding understanding of arithmetic the system "paper + dumb-rule-follower+ some book of rules" as whole. This is because I am taking understanding as a matter of degree. The degree increases with increase in relevant skills (for example, if the system can talk (or having functional characteristics mappable to) about advanced number theory, talk about complex philosophical topics about metaphysics of numbers, then I would count that as "deeper understanding of arithmatic")

(7) 5/6 can be counter-intuitive. But the challenge here is to find an interesting positive feature of understanding that the system lacks. We can probably decide on some functional characteristics or some need of low-level instantiation details (beyond high level simulation of computational formal relations) if we want (I personally don't care either way) to restict paper+dumb-rule-followers to simulate super-intelligence. But phenomenality doesn't seem too interesting to me, and even intentionality is a bit nebulous (and controversial; I also relate to intentional-talk being simply a stance that we take to talk about experiences and thoughts rather than taking it as something metaphysically intrinsic in phenomenal feels (https://ase.tufts.edu/cogstud/dennett/papers/intentionalsystems.pdf)). Some weaker notion of intentionality can definitely be allowed already in any system behavior (including paper-based TM simulation as long as that is connected to a world for input-output signals). Part of the counter-intuitive force may come from the fact is that our usage of words and "sense" of a word x applies in context y, can be a bit rooted in internal statistical models (the feeling of "intuition" that it doesn't feel right to say the system "understands" is the feeling of ill-fittingnes due to our internal statistical models). However, if I am correct that the words have no determinate meaning -- it may be too loose or even have contraidcting usages, in our effort to clean them up through conceptual engineering it may be inevitable that some intuition needs to be sacrificed (because our intuitions themselves can be internally inconsistent). Personally, considering both ways, I am happier to bite the bullet here and allow papers+dumb-writers to understanding things as whole when the individual parts don't: simply following from my minimalist definition of understanding revolving around highl level demonstrable skills. I feel like more idealized notion are hard to define and get into mysterian territories while also unnecessarily complicating the word.

Nameless1995 t1_ivpyc47 wrote on November 9, 2022 at 7:28 PM

Reply to comment by PrivateFrank in [D] What does it mean for an AI to understand? (Chinese Room Argument) - MLST Video by timscarfe

> Intentionality and understanding and first-person (phenomonologistic) concepts, and I think that's enough to have the discussion. We know what it is like to understand something or have intentionality. Intentionality in particular is a word made up to capture a flavour of first-person experience of having thoughts which are about something.

> I think that to have "understanding" absolutely requires phenomenal consciousness. Or the "understanding" in an AI has could be the same as how much a piece of paper understands the words written upon it. At the same time, none of the ink on that page is about anything - it just is. There's no intentionality there.

That's exactly Searle's bigger point with CRA. CRA is basically in the same line as Chinese Nation and Dneprov's game, paper simulations, and such arguments (not particularly novel). The larger point of Searle is that although machines may understand (according to Searle, we are such machines), programs don't understand by simply the characteristics of the programs. That is particular instantiation of programs (and we may be such instantiation) may understand depending on the underlying causal powers and connections instantiating the program but not any arbitrary instantiation may understanding (for example "write symbols in paper" or "arrange stone" instantiations).

The stress is exactly that in any arbitrary instantiation may not have relevant "intentionality" (the point becomes muddied because of the focus on the simplicity of the CRA.)

However, a couple of points:

(1) First I think at the beginning of disagreements, it's important to set up the semantics. Personally, I think a distributionalist attitude towards language. First, I am relatively externalist about meaning of words (meanings of words are not decided by solo people in the mind, but it's grounded in how the word is used in public). Second, like Quine, I don't think most words have clean determinate "meaning" and especially more so for words like "understanding". This is because there are divergences among how different individuals use words, and there may be no consistent unfying "necessary and sufficient" rules explaining the use of words (because some usages by different people may contradict each other). Such issues don't express that much in day to day discourse, but becomes more touchy when it comes to philosophy. So what do we do? I think the answer is something like "conceptual engineering". We can try to make suggestions about refinements and further specifications of nebulous concepts like "understanding" when it's necessary for some debate. Then it can be upto the community as a whole to accept and normalize those usages in necessary contexts or counter-suggests alternatives.

(2) With the background set up in (1), I believe "understanding"'s meaning is indeterminate. Now from a conceptual engineering perspective, I feel like we can go multiple different mutually exclusive branches in trying to refine the concept of "understanding". However, we need to have some constraints as well (we need to ideally keep the word's usage still roughly similar to how it is).

(3) One way to make the notion of "understanding" more determinate, is to simply stipulate the need of "intentionality" and "phenomenality" for understanding.

(4) I think making intentionality essential for understanding is fair (I will come back to this point), but not sure if "phenomenality" is as essential for understanding (not needed for the word to roughly correspond to how we use it).

(4.1) First note that it is widely accepted in Phil. of Mind, that intentionality doesn't require phenomenality. There is also a debate on whether intentionality ever has a phenomenal component at all. For example, Tye may argue in a phenomenal field, there is just sensations, sound, images, imageries etc. no "thoughts" representing things as about something. When thinking about "1+1=2", Tye may say the thought itself is not phenomenally present, instead you would have just some phenomenal sensory experiences associated with the thought (perhaps your a vague audio experience of 1+1=2, perhaps some visual experiences of 1,+1,=2 symbols in imaginations and so on). Functionalists can have a fully externalist account of intentionality. They may say say that some representation in mental state being "about" some object in the world is simply a matter of having the relevant causal connection (which can be acheived by any arbitrary instantiations of a program with proper interface to the io signals - "embedding it in the world") or relevant evolutionary history (eg. teleosemantics) behind the reasons for the selection of the representation-producer-cosumer mechanism. This leads to the "so-called" causal theories of intentionality. They would probably reject intentionality as being some "internal" flavor of 1st person experience, rather than it being grounded in the embodiment of the agent.

(4.2) Even noting Anscombe's work on intentional theories of perception that was kind of one of the starting points on intentional theories -- she was pretty neutral on the metaphysics of intentionality and she was trying to take a very anti-reificationist stance, even close to treating it more of a linguistic device. She also distinguished intentional content from being mental objects. For example, if someone's worshiping has the intentional object -Zeus (they worship Zeus), then it's wrong to say that the intentional content is "the idea of zeus" (because it's wrong to say that the subject is simply worshipping the idea of Zeus, because the subject's 'intention' is to worship the true Zeus which happens to not exist - but that's irrelevant). This raises the question - what kind of metaphysical states can even constitute or ground this "subject taking the intentional content of her worship as being Zeus" (the intentional content is not exactly the idea of Zeus, or Zeus-imageries that the subject may experience -- but then what does this "intentionality" correspond to?). After thinking I couldn't really think of any sensible answer besides again going back to functionalism: the intentional object of the subject is zeus because that's the best way to describe their behaviors and functional dispositions.

(4.3) That's not the only perspectives of intentionality of course. There are for example, works by Brentaro and approaches of intentionality from core phenomenological perspectives. But it's not entirely clear if there is some sort of differentiable "phenomenal intentionality" in phenomenality. I don't know if I distinctly experience "aboutness" rather than simply having a tendencing to use "object-oriented language" in my thinking (which itself isn't distinctly or obvious phenomenal in nature). Moreover, while me understanding certain concepts may "feel" phenomenally to be a certain way, it feels to be a very poor account of understanding. Understanding is not a "feeling" nor it is clear why having the "phenomenal sense" of "understanding xyz" is necessary. Instead upon reflection of me for example having the "understanding of arithmetic" constitutes, I don't find it to be necessarily associated with my qualitatively feeling a certain way with dispositions to say or think about numbers, +, - (they happen, and the "qualitative feeling" may serve as a "mark" signifying understanding -- "signifying" in a functional correlational sense), but it seems to be most meaningfully constituted by possession of "skills" (ability to think arithmetics, solve arithmetical problems etc.). This again leads to functionalism. If I try to think beyond that, I find nothing determinate in 1st person experience constituting "understanding".