Recent comments in /f/deeplearning

suflaj t1_j731s6u wrote

I mean kernels in the sense of functions.

> Why wouldn't GPU parallelization make inference faster?

Because most DL models are deep, and not exactly wide. I've explained already, deep means a long serial chain. Not parallelizable outside of data parallelism, which doesn't speed up inference, and model parallelism (generally not implemented, and has heavy IO costs).

Wide models and how they become equivalent to deep ones are unexplored, although they are theoretically just as expressive.

1

Open-Dragonfly6825 OP t1_j72yyst wrote

Could you elaborate on some of the points you make? I have read the opposite to what you say regarding the folliwng points:

  • Many scientific works claim that FPGAs have similar or better power (energy) efficiency than GPUs in almost all applications.
  • FPGAs are considered a good AI technology for embedded devices where low energy consumption is key. Deep Learning models can be trained somewhere else, using GPUs, and, theoretically, inference can be done on the embedded devices using the FPGAs, for good speed and energy efficiency. (Thus, FPGAs are supposedly well-suited for inference.)
  • Modern high-end (data center) FPGAs target 300 MHz clock speeds as base speeds. It is not unusual for designs to achieve performances higher than 300 MHz. Not much higher, though, unless you highly optimize the design and use some complex tricks to boost the clock speeds.

The comparison you make about the largest FPGA being comparable only to small embedded GPUs is interesting. I might look more into that.

1

Open-Dragonfly6825 OP t1_j72s5ov wrote

One question: what do you mean by "kernels" here? It is the CNN operation you do to the layers? (As I said, I am not familiar with Deep Learning, and "kernels" means another thing when talking about GPU and FPGA programming.)

I know about TPUs and I understand they are the "best solution" for deep learning. However, I did not mention them since I won't be working with them.

Why wouldn't GPU parallelization make inference faster? Isn't inference composed mainly of matrix multiplications as well? Maybe I don't understand very well how GPU training is performed and how it differs from inference.

1

Open-Dragonfly6825 OP t1_j72qtao wrote

That actually makes sense. FPGAs are very complex to program, even though the gap between software and hardware programming has been narrowed with High Level Synthesis (e.g. OpenCL). I can see how it is just easier to use a GPU that is simpler to program, or a TPU that already has compatible libraries built for that abstract the low level details.

However, FPGAs have been increasing in area and available resources in recent years. It is still not enough circuitry?

1

Open-Dragonfly6825 OP t1_j72pzlc wrote

FPGAs are reconfigurable hardware accelerators. That is, you could theoretically "syntehthize" (implement) any digital circuit into an FPGA, given that the FPGA has a high enough amount of "resources".

This would let the user to deploy custom hardware solutions to virtually any application, which could be way more optimized than software solutions (including using GPUs).

You could implement tensor cores or a TPU using an FPGA. But, obviously, an ASIC is faster and more energy efficient than its equivalent FPGA implementation.

Linking to what you say, besides all the "this is just theory, in practice things are different" of FPGAs, programming GPUs with CUDA is way way easier than programming FPGAs as of today.

2

Open-Dragonfly6825 OP t1_j72om7m wrote

Maybe I missed it, but the posts I read don't specify that. Some scientific works claim that FPGAs are better than GPUs both for training and inference.

Why would you say they are better only for inference? Wouldn't a GPU be faster for inference too? Or is it just that inference doesn't require high speeds and FPGAs are for their energy efficiency?

1

AzureNostalgia t1_j7199dd wrote

Don't listen to anyone saying FPGAs are better than GPUs in AI. They don't know the platforms well enough.

FPGAs are obsolete for AI (training AND inference) and there are many reasons for that. Less parallelism, less power efficiency, no scaling, they run at like 300Mhz at best, they don't have the ecosystem and support GPUs have (i.e. support for models and layers). Even the reduced precision "advantage" they had it is now gone a long time ago. GPUs can do 8bit and even FP8 now. Maybe the largest FPGA (for example a Xilinx Alveo card) can be compared with a small embedded Jetson Xavier in AI. (you can compare the performance results from each company to see yourself).

Wonder why there are no FPGAs in MLPerf? (an AI benchmark which became the standard). Yeah you guess it right. Even Xilinx realized how bad FPGAs are for AI and stopped their production for this reason. They created the new Versal series which are not even FPGAs, they are more like GPUs (specifically they work like Nvidia Tensor cores for AI).

To sum up, FPGAs are worse in everything when compared with GPUs. Throughput, latency, power efficiency, performance/cost, you name it. Simple as that.

2

yannbouteiller t1_j70o6y3 wrote

FPGAs are theoretically better than GPUs to deploy Deep Learning models simply because they are theoretically better than anything at doing anything. In practice, though, you never have enough circuitry on an FPGA to efficiently deploy a large model, and they are not targetted by the main Deep Learning libraries so you have to do the whole thing by hand including quantizing your model, extracting its weights, coding each layer in embedded C/VHDL/etc, and doing most of the hardware optimization by hand. It is tedious enough for preferring plug-and-play solutions like GPUs/TPUs in most cases, including embedded systems.

4

BellyDancerUrgot t1_j6zyiqm wrote

I’ll be honest, I don’t really know what FPGAs (I reckon they are an ASIC for matrix operations?) do and how they do it but tensor cores already provide optimization for matrix / tensor operations and fp16 and mixed precision has been available for quite a few years now. Ada and hopper even enable insane performance improvements for fp8 operations. Is there any real verifiable benchmark that compares training and inference time of the two?

On top of that there’s the obvious Cuda monopoly that nvidia has a tight leash on. Without software even the best hardware is useless and almost everything is optimized to run on Cuda backend.

0

suflaj t1_j6zq1k9 wrote

Well one reason I could think of why is custom kernels. To really get the most out of your model performance, you will likely be optimizing the kernels you use for your layers, sometimes fusing them. A GPU can't adapt to that as well. The best you can do is use TensorRT to optimize for a speficic model of GPU, but why do that when you can create ex. the optimal CNN kernel in hardware on an FPGA? On a GPU you can only work with the hardware that came with the GPU.

That being said, this is in regard to processing, not necessarily scaling it up. And maybe it makes sense for inference, where it would be nice making a processor that is made specifically to run some architecture and which doesn't necessarily process things in large batches.

But for training, obviously nothing is going to beat a GPU/TPU cluster because of pricing and seemingly infinite scaling of GPUs. If money is not a problem you can always just buy more GPUs and your training will be faster. But parallelization will probably not make your inference faster, since the "deep" in DL refers to the long serial chain of processing, and that's where a hardware implementation of the optimized model makes sense.

Ideally, though, you'd want a TPU, not FPGA processors. TPUs are cheaper and you can use them for research as well.

5

Vegetable-Skill-9700 OP t1_j6zpscd wrote

Firstly, by measuring data drift and analyzing user behavior, UpTrain identifies which prompts/questions were unseen by the model or the cases where the user was unsatisfied with the model output. It automatically collects those cases for the model to retrain upon.

Secondly, you can use the package to define a custom rule and filter out relevant data sets to retrain ChatGPT for your use case.

Say you want to use LLM to write product descriptions for Nike shoes and have a database of Nike customer chats:
a) Rachel - I don't like these shoes. I want to return them. How do I do that?
b) Ross - These shoes are great! I love them. I wear them every day while practicing unagi.
c) Chandler - Are there any better shoes than Nike? 👟 😍
You probably want to filter out cases with positive sentiments or cases with lots of emojis. With UpTrain, you can easily define such rules as a python function and collect those cases.

I am working on an example highlighting how all the above can be done. It should be done in a week. Stay tuned!

2

Vegetable-Skill-9700 OP t1_j6zpmiy wrote

Hey, so this typically happens when there is a change in vocabulary. Just sharing my experience of facing this issue, we built a chatbot to answer product onboarding queries and with a new marketing campaign, we got a great influx of younger audience. Their questions were generally accompanied with a lot of urban slang and emojis which our NLP model wasn't equipped to handle, causing the performance to deteriorate.

2

BlacksmithNo4415 t1_j6x2xia wrote

i've checked for papers that do exactly what you want.

so as I assumed this data is time sensitive and therefor you need an additional temporal dimension.

this model needs to be more complex in order to solve this problem.

i suggest reading this:

https://bmcmedinformdecismak.biomedcentral.com/articles/10.1186/s12911-021-01736-y

​

BTW: have you tried grid search for finding the right hyperparametrs?

oh and your model does improve..

have you increased the data set size??

1