Recent comments in /f/deeplearning

Horneur OP t1_j1zwo77 wrote

Reply to comment by Nater5000 in Making an AI play LoL by Horneur

Indeed I am in my last class of high school. But since I'm planning to go to a general engineering school and therefore won't be able to specialize in DL, if I have to do it I'll have to learn by myself.
Also, I want to do some smaller projects but I don't have any ideas of things that would be fun, I would like to do a deep learning AI on a simpler team/duo game but I don't know which one is easy enough. If you have any ideas I would like to.
Translated with www.DeepL.com/Translator (free version)

2

Nater5000 t1_j1zuk4x wrote

Based on your post, I think it's safe to assume you're a few years away from being able to even properly plan for such a project.

I hate to discourage you from at least trying, but the task you're aiming to accomplish likely needs at least a small team of PhD-level researchers and resources to accomplish. You may want to start way smaller to get a grip on what's out there and how things work before taking on a challenge like training an RL agent to play LoL.

6

knight1511 t1_j1w32f7 wrote

What you are querying borders semantic search. That is you are not looking for the exact phrase but for something that semantically means the same thing. Unfortunately this is not something elasticsearch can do for you out of the box.

Traditional search engines work by matching the exact words of your query and looking for its occurrences in the document. They do this by creating an inverted index, which is nothing but a lookup table of all the words/tokens present in a document. They do this for all documents you want to index. Then when a query comes in they use some similarity algorithm to evaluate the contents of all the indexed documents against the words/tokens present in the query. They then return the documents in a ranked order from most similar to least based on the score. The semantics or the “meaning” of the text is not considered.

If your query has some overlapping words with the document you are looking for, then sure you will get some relevant documents back. But if there is NO words that are same then this will not work. For example you cant expect it to return a document with the phrase “the computer is not working “for the query “the pc is broken

What you are looking for is semantic search. There are some pre-trained language models on HuggingFace whose embeddings can be used as search index. And there is an open-source FAISS library by Facebook that allows you to search it. But the specifics of this is highly dependant on your use case. Also the implementation is a bit more complex and will requires some coding expertise and understanding of ML. You will need professional help unless you already know the stuff or are willing to learn

I have experience doing this before. But to be honest this is a time consuming task and not something to be done for free ;)

1

iacoposk8 OP t1_j1udt2j wrote

I would like to search only in the content of a text, but in an intelligent way.

So if in the text file it says: "Hardware is the physical part of the computer and software is the logical part"

and to find it I wrote: "Hardware is the part of the computer that we can touch while software is the programs"

It should be able to find it for me anyway, right?

1

knight1511 t1_j1u3uv3 wrote

Also if you are adept at coding in Java, you can look at the Apache Lucene or Solr library. This is what most search engines use behind the scenes. It is low level but it allows you to configure your program to do exactly what you want it to do.

By the way, do you want to search the filenames as well or just the content inside the files?

1

knight1511 t1_j1u3lik wrote

Elastic search is free if you self host it. You are only charged if you want to rely on their infrastructure or some advanced features. From what you're saying, sounds like an inverted index is more than enough for your use case which is a free feature.

I have used elasticsearch for simple use cases via docker. https://hub.docker.com/_/elasticsearch

Just spin up the container and index the extracted text data. Then you can run queries against it.

For Papermerge I am not sure. I had heard about it some time back and it looks like the project is abandoned or updated infrequently

1

iacoposk8 OP t1_j1tz1yy wrote

I looked at elasticsearch but its quite expensive. Are there any free or cheaper alternatives? I also tried Papermerge but the search doesn't work, even if I try to search for exact phrases or the file name. Is there something I may be forgetting to do?The file is already text without images

1

knight1511 t1_j1qf7dz wrote

Is the text in image format or can it be directly extracted in digital format?

If it is digital format then you can extract the text directly by using pdfminer. It has packages available in Java and python.

If the the pdf has images inside it you need to ocr the text first. ocrmypdf is a very handy python psckage that uses Google's Tesseract OCR Engine to convert images into digital characters. It is not a perfect process but if the images are of good quality then it is almost perfect. Once you have the text in digital format, it can be indexed by a search engine.

To search through the text you can simply use a ready to use a search engine like elasticsearch. You just need to supply the extracted text to the engine to be indexed. Then you can query it easily.

One easy way to use elastic search is to use it via docker. It's easy to get started with provided you are already familiar with docker

Edit: Alternatively you can explore free and open source software called Papermerge. link

3

chengstark t1_j1gqy3z wrote

Look up some model compression techniques, use smaller batch sizes etc. sorry for your situation, it is very hard to do proper work without the proper tools.

2

ReallySeriousFrog t1_j1e12c3 wrote

Oh man, I wasn't aware - how does your supervisor or professors and post docs in AI department use large models? Maybe they can share a bit of their compute with you?

I am not aware of any free server space with enough GPU/RAM sadly

Alternatively, is it maybe an option to use a smaller version of the model. Maybe there is a model with trimmed weights?

1