rshah4 t1_jdxhesh wrote on March 27, 2023 at 10:19 PM

Reply to [D] Instruct Datasets for Commercial Use by JohnyWalkerRed

It’s possible to pay one of the labeling companies for an instruction dataset. Right now most companies aren’t donating 50k+ datasets to the public, but I expect this will change soon.

aadityaubhat OP t1_jdxhe79 wrote on March 27, 2023 at 10:19 PM

Reply to comment by addandsubtract in [P] 🎉 Announcing Auto-Analyst: An open-source AI tool for data analytics! 🎉 by aadityaubhat

Sure, currently it supports aggregation and visualization, I am working to add more functionality to it.
The core process of Auto-Analyst consists of several steps:

⁠Parsing the data, description, and question: The tool takes your data and a plain English question as input, then parses and understands the context.
⁠Basic data cleaning: Before diving into the analysis, Auto-Analyst cleans the data to ensure it's ready for processing.
⁠Determining the answer type: Based on the input question, Auto-Analyst figures out if the answer can be provided through aggregation or visualization.
⁠Aggregation: If the question requires an aggregated answer, Auto-Analyst leverages the OpenAI API to generate an SQL query. It then tries running the query on the data. If it fails, the OpenAI API is used to correct the query. This process continues until a working query is obtained or the user-defined maximum number of tries is reached. The aggregation results are then returned to the user.
⁠Visualization: If the question calls for a plot, Auto-Analyst first identifies the aggregated data needed for the visualization. It uses the aggregation steps described above to obtain this data. Next, it employs the OpenAI API to generate Python code for the plot and returns the visualization to the user.

gBoostedMachinations t1_jdxh438 wrote on March 27, 2023 at 10:17 PM

Reply to [D]GPT-4 might be able to tell you if it hallucinated by Cool_Abbreviations_9

Confabulate. It confabulates. It doesn’t hallucinate. I can’t believe “hallucinate” is the word that stuck lol. Usually computer scientists and tech bros are cleverer than this.

suflaj t1_jdxh3kc wrote on March 27, 2023 at 10:17 PM

Reply to comment by vintergroena in [D] Can we train a decompiler? by vintergroena

That would be breaching copyright. Depending on the company and the product, you'd get anywhere from a pretty nasty debt to straight up ruining your whole life (and potentially the lives of your family and people associated with you).

The same way you wouldn't steal from the mob, you would not steal from a company that makes money on a product FOSS can't compete with. Aside from that, decompilers exist for a very long time yet we have not witnessed such vigilantism.

[deleted] t1_jdxgyy0 wrote on March 27, 2023 at 10:16 PM

Reply to comment by addandsubtract in [P] 🎉 Announcing Auto-Analyst: An open-source AI tool for data analytics! 🎉 by aadityaubhat

[deleted]

learn-deeply t1_jdxgxsx wrote on March 27, 2023 at 10:16 PM

Reply to comment by Smallpaul in [D] Instruct Datasets for Commercial Use by JohnyWalkerRed

This isn't correct, at least in the US. AI-generated material is not considered copyrightable unless there has been significant human involvement.

https://www.federalregister.gov/documents/2023/03/16/2023-05321/copyright-registration-guidance-works-containing-material-generated-by-artificial-intelligence

suflaj t1_jdxgd3z wrote on March 27, 2023 at 10:12 PM

Reply to [D] Can we train a decompiler? by vintergroena

In most cases yes, but inherently no. Understand that compilers, as part of their optimization step, might compile high level code into something that you can't really connect with the actual code. Part of the information is lost in the optimization step and so in a general case you will not be able to revert the compilation step. At least not fully, of course you will be able to get something resembling the solution, but it is not guaranteed to be the exact code that compiled into your starting input.

This is, of course, after taking into consideration you will not be able to recover dead source code if it's never compiled into something. Because if you take this into account, even if a language does not optimize the source code otherwise, if it only discards dead code: you are also losing information.

And also, this is disregarding name mangling. Obviously name mangling can be done in a way you have information loss, but this is probably irrelevant since concrete entity names are not that relevant.

tamilupk OP t1_jdxfyf7 wrote on March 27, 2023 at 10:09 PM

Reply to comment by New-Act1498 in [D] Will prompting the LLM to review it's own answer be any helpful to reduce chances of hallucinations? I tested couple of tricky questions and it seems it might work. by tamilupk

Looks like the Open AI has tried it, check the response to the first reply or the GPT 4 paper annex.

Smallpaul t1_jdxf3u3 wrote on March 27, 2023 at 10:03 PM

Reply to comment by ninjasaid13 in [D] Instruct Datasets for Commercial Use by JohnyWalkerRed

Probably not legally different than a document you created with a word processor.

addandsubtract t1_jdxeqdo wrote on March 27, 2023 at 10:00 PM

Reply to [P] 🎉 Announcing Auto-Analyst: An open-source AI tool for data analytics! 🎉 by aadityaubhat

Can you go into more details as to what this does and how it works?

[deleted] t1_jdxeb6k wrote on March 27, 2023 at 9:57 PM

Reply to [P] 🎉 Announcing Auto-Analyst: An open-source AI tool for data analytics! 🎉 by aadityaubhat

[deleted]

jms4607 t1_jdxd3hv wrote on March 27, 2023 at 9:49 PM

Reply to [D] Will prompting the LLM to review it's own answer be any helpful to reduce chances of hallucinations? I tested couple of tricky questions and it seems it might work. by tamilupk

Makes me if you could fine-tune by just incentivizing first answer to be that with a general accuracy/review rq

Chabamaster t1_jdxaqdd wrote on March 27, 2023 at 9:33 PM

Reply to [D]GPT-4 might be able to tell you if it hallucinated by Cool_Abbreviations_9

The fact that people call wrong answers a hallucination now seems very weird to me because it sounds like a marketing term to make the model seem smarter/conscious

konstantin_lozev OP t1_jdxahve wrote on March 27, 2023 at 9:31 PM

Reply to comment by a_marklar in [D] 3d model generation by konstantin_lozev

Thanks! That looks promising, indeed, if somewhat narrow.

ninjasaid13 t1_jdx9x3n wrote on March 27, 2023 at 9:28 PM

Reply to comment by Smallpaul in [D] Instruct Datasets for Commercial Use by JohnyWalkerRed

can you even copyright a dataset generated by an AI?

a_marklar t1_jdx7ryn wrote on March 27, 2023 at 9:13 PM

Reply to [D] 3d model generation by konstantin_lozev

In a limited sense, we're already there. For example, the microsoft avatar generation.

I'd guess that its very unlikely that generative models will use triangles. Point clouds, SDFs, parametric surfaces all seem to be better data formats for these types of things. Those can all be converted to triangle meshes if that's required.

AndreasVesalius t1_jdx7r37 wrote on March 27, 2023 at 9:13 PM

Reply to comment by bpooqd in [D]GPT-4 might be able to tell you if it hallucinated by Cool_Abbreviations_9

It’s just a bunch of if/else statements

ReasonablyBadass t1_jdx7f88 wrote on March 27, 2023 at 9:11 PM

Reply to [D] Simple Questions Thread by AutoModerator

I still remember the vanishing/exploding gradient problem. It seems to be a complete non issue now. Was it just Relus and skip connections that sovled it?

onequark t1_jdx5o96 wrote on March 27, 2023 at 8:59 PM

Reply to [P] SimpleAI : A self-hosted alternative to OpenAI API by lhenault

Awesome, does it support token streaming?

12max345 t1_jdx4sdm wrote on March 27, 2023 at 8:54 PM

Reply to [D] GPT4 and coding problems by enryu42

I feel like LLMs have encoded sort of law of a languages in their latent space through texts and responding accordingly, anything that follows a law isnt called concious for e.g inaminate objects follow law of physics,but that doesnt mean that it indicates an intelligent behvaiour.

After all texts are medium to represent our thoughts, its the thoughts that matter not the medium.

The concept of causality , fundamental reality , and dcesion making is much more than following laws of languages which are just a means.

These LLMs cant question you until you ask them explicitly,they cant interject you , knowledge was never consciousness ,its these abilities that compose consciousness

I dont know how much sense i make to others or maybe i am at loss of good words,in a nutshell any model that fundamentally predicts tokens based of weightage of previous tokens can never achieve consciousness. We

[deleted] t1_jdx1tgn wrote on March 27, 2023 at 8:35 PM

Reply to comment by muskoxnotverydirty in [D]GPT-4 might be able to tell you if it hallucinated by Cool_Abbreviations_9

[deleted]

[deleted] t1_jdx0jo8 wrote on March 27, 2023 at 8:27 PM

Reply to comment by Colecoman1982 in [D]GPT-4 might be able to tell you if it hallucinated by Cool_Abbreviations_9

[deleted]

lxe t1_jdx0adz wrote on March 27, 2023 at 8:25 PM

Reply to [N] Predicting Finger Movement and Pressure with Machine Learning and Open Hardware Bracelet by turfptax

I've been using ChatGPT to help me with machine learning and data transformation as well. I knew very little of the field, and now with its help I feel like I have a superpower.

meister2983 t1_jdx06k9 wrote on March 27, 2023 at 8:24 PM

Reply to comment by sineiraetstudio in [D]GPT-4 might be able to tell you if it hallucinated by Cool_Abbreviations_9

Yes. RLHF both increases accuracy on certain tests while decreasing calibration on others.

was_der_Fall_ist t1_jdwz4qw wrote on March 27, 2023 at 8:18 PM

Reply to comment by sineiraetstudio in [D]GPT-4 might be able to tell you if it hallucinated by Cool_Abbreviations_9

I think you make a good point. We probably need better methods of post-training LLMs. But it does seem like the current regime is still sometimes more useful than the pre-trained model, which Christiano also says. It's only in some contexts that this behavior is worse. I'm not sure if it's really better than top-p sampling, though. I'm not sure that it is. But RLHF models do seem pretty useful.

Recent comments in /f/MachineLearning