Recent comments in /f/MachineLearning

turfptax OP t1_jdws798 wrote

Thank you!

I have friends with other sensors systems but my goal is also to provide the platform for all types of biometric sensors with labels to be used in a similar vein.

I'm working on the next prototype that can be tested in the field with higher bit ADCs.

The label system was the most important to simplify the problem to be able to test different sensors and configurations.

7

sineiraetstudio t1_jdws2iv wrote

(The graph doesn't give enough information to determine whether it's actually becoming more confident in its high-confidence answers, but it sounds like a reasonable enough rationale.)

I'm not sure I understand what distinction you're trying to draw. The RLHF'd version assigns higher confidence to answers than it actually gets correct, unlike the original pre-trained version. That's literally the definition of overconfidence.

You might say that this is more "human-like", but being human-like doesn't mean that it's good. If you want only the most likely answer, you can already do this via the sampler, while on the hand calibration errors are a straight up downside as Paul Christiano explicitly mentions in the part you quoted. If you need accurate confidence scores (because you e.g. only want to act if you're certain), being well-calibrated is essential.

2

Badbabyboyo t1_jdwreio wrote

That’s awesome! Keep up the good work. Most people don’t realize we got voice to text translation technology about as far as it could go in the 90’s and it wasn’t until they combined it with machine learning in the 2000’s that it really improved to the point of being useful. A majority of future human machine interfaces will probably have to be developed using machine learning and this is a perfect example!

6

fmfbrestel t1_jdwmb7z wrote

Most of those problems are due to the input/memory limitations for general use. I can imagine locally hosted GPTs that have training access to an organization's source code, development standards, and database data structures. Such a system could be incredibly useful. Human developers would just provide the prompts, supervise, approve, and test new/updated code.

Would have to be locally hosted, because most orgs are NOT going to feed their source code to an outside agency regardless of the promises of efficiency.

2

JohnyWalkerRed OP t1_jdwjvxy wrote

Yeah like the databricks dolly post is funny to me because they are an enterprise software company and dolly is not really useful in the context they operate in. I guess they just wanted to get some publicity.

Looks like openassist, when mature, could enable this. Although it seems the precursor to an Alpaca-like dataset is an RLHF model, which itself needs human-labeled dataset, so that bottleneck needs to be solved too.

9

muskoxnotverydirty t1_jdwjc1w wrote

And this method doesn't have some of the drawbacks seen in OP's prompting. Giving an example of an incorrect response followed by self-correction within the prompt may make it more likely that the initial response is wrong, since that's the pattern you're showing it.

2

was_der_Fall_ist t1_jdwdxut wrote

My understanding is that rather than being overconfident in their answers, they simply produce the answer they’re most confident in instead of differentially saying each answer proportional to how confident they are. This seems similar to how humans work — if you ask me a yes or no question and I’m 80% sure the answer is yes, I’m going to say “yes” every time; I’m not going to say “no” 20% of the times you ask me, even though I assign a 20% chance that “no” is correct. In other words, the probability I say yes is not the same as the probability I assign to yes being correct. But I admit there are subtleties to this issue with which I am unfamiliar.

4