Meilisearch – search engine API bringing AI-powered hybrid search

144 points by modinfo a day ago

bsnnkv a day ago

Been a happy user of MS in production for https://notado.app for many years, and someone from MS even reached out to me a few years ago thanking me for my write-up of syncing Postgres records to MS[1], saying they used it as a reference for something they later shipped.

I haven't kept up with the latest updates, all these new AI references don't inspire confidence at all, but the older version I'm running is chugging along and doing a great job.

[1]: https://notado.substack.com/p/how-notado-syncs-data-from-pos...

irevoire a day ago

AI is completely opt-in. As long as you don’t specify an embedder in the settings, you can continue using the engine as usual. There was no breaking change and the performances improved a lot in v1.12 so you might want to give it a try!
nkmnz 13 hours ago

May I ask why you chose to write and deploy a whole service in go over using pgsql-http to make synchronous http requests directly from Postgres to meilisearch? This would also remove the need of using listen/notify.
- bsnnkv 7 hours ago
  
  I hadn't heard of it back in 2019, and even now that I look at it for the first time, I still think I'd rather maintain a separate sync service than maintain a Postgres instance with custom extensions and write HTTP requests in SQL.
amelius a day ago

If I may ask, how many searches per day over what volume of data?
- bsnnkv 15 hours ago
  
  It's a bookmarking service so people rarely search - they just throw stuff into a bucket that they'll probably never go back to for the rest of their lives :)
  I picked MS primarily because I was impressed with the quality of search results across multiple languages - even if this were a more search-heavy service, I would probably still pick MS because high quality multilingual search results will always have the highest importance for me as a multilingual person.
troupo 18 hours ago

> all these new AI references don't inspire confidence at all
Everyone os jumping onto the bandwagon. Just now I saw this ad for Algolia: https://x.com/algolia/status/1894494831021625506 "Want to plug AI search into your site?"

adeptima a day ago

Meilisearch is great, used it for a quick demo

However if you need a full-text search similar to Apache Lucene, my go-to options are based on Tantivy

Tantivy https://github.com/quickwit-oss/tantivy

Asian language, BM25 scoring, Natural query language, JSON fields indexing support are all must-have features for me

Quickwit - https://github.com/quickwit-oss/quickwit - https://quickwit.io/docs/get-started/quickstart

ParadeDB - https://github.com/paradedb/paradedb

I'm still looking for a systematic approach to make a hybrid search (combined full-text with embedding vectors).

Any thoughts on up-to-date hybrid search experience are greatly appreciated

jitl a day ago

Quickwit was bought by Datadog, so I feel there's some risk quickwit-oss becomes unmaintained if Datadog's corporate priority shifts in the future, or OSS maintenance stops providing return on investment. Based on the Quickwit blog post, they are relicensing to Apache2 and releasing some enterprise features, so it seems very possible the original maintainers will move to other things, and it's unclear if enough community would coalesce to keep the project moving forward.
https://quickwit.io/blog/quickwit-joins-datadog#the-journey-...
- iambateman a day ago
  
  I have an implementation of Quickwit, so I've thought about this.
  The latest version is stable and fast enough, that I think this won't be an issue for a while. It's the kind of thing that does what it needs to do, at least for me.
  But I totally agree that the project is at risk, given the acquisition.
kk3 a day ago

As far as combining full-text search with embedding vectors goes, Typesense has been building features around that - https://typesense.org/docs/28.0/api/vector-search.html
I haven't tried those features but I did try Meilisearch awhile back and I found Typesense to index much faster (which was a bottleneck for my particular use case) and also have many more features to control search/ranking. Although just to say, my use case was not typical for search and I'm sure Meilisearch has come a long way since then, so this is not to speak poorly of Meilisearch, just that Typesense is another great option.
- jimmydoe 3 hours ago
  
  +1 typesense is really fast. the only drawback is starting up is slow when index getting larger. the good thing is full text search (excl vector) is relatively stable feature set, so if your use case is just FTS, you won't need to restart very often for version upgrade.
- Kerollmops a day ago
  
  Meilisearch just improved the indexing speed and simplified the update path. We released v1.12 and highly improved indexing speed [1]. We improved the upgrade path with the dumpless upgrade feature [2].
  The main advantage of Meilisearch is that the content is written to disk. Rebooting an instance is instant, and that's quite useful when booting from a snapshot or upgrading to a smaller or larger machine. We think disk-first is a great approach as the user doesn't fear reindexing when restarting the program.
  That's where Meilisearch's dumpless upgrade is excellent: all the content you've previously indexed is still written to disk and slightly modified to be compatible with the latest engine version. This differs from Typesense, where upgrades necessitate reindexing the documents in memory. I don't know about embeddings. Do you have to query OpenAI again when upgrading? Meilisearch keeps the embeddings on disk to avoid costs and remove the indexing time.
  [1]: https://github.com/meilisearch/meilisearch/releases/tag/v1.1... [2]: https://github.com/meilisearch/meilisearch/releases/tag/v1.1...
  - kk3 15 hours ago
    
    Thank you for the response here. Not being able to upgrade the machine without completely re-indexing has actually become a huge issue for me. My use case is that I need to upgrade the machine to perform a big indexing operation that happens all at once and then after that reduce the machine resources. Typesense has future plans to persist the index to disk but it's not on the road map yet. And with the indexing improvements, Meilisearch may be a viable option for my use case now. I'll be checking this out!
- irevoire a day ago
  
  I hate the way typesense are doing their « hybrid search ». It’s called fusion search and the idea is that you have no idea of how well the semantic and full text search are being doing, so you’re going to randomly mix them together without looking at all at the results both searches are returning.
  I tried to explain them in an issue that in this state it was pretty much useless because you would always have one or the other search strategy that would give you awful results, but they basically said « some other engine are doing that as well so we won’t try to improve it » + a ton a justification instead of just admitting that this strategy is bad.
  - jabo a day ago
    
    We generally tend to engage in in-depth conversations with our users.
    But in this case, when you opened the GitHub issue, we noticed that you’re part of the Meilisearch team, so we didn’t want to spend too much time explaining something in-depth to someone who was just doing competitive research, when we could have instead spent that time helping other Typesense users. Which is why the response to you might have seemed brief.
    For what it’s worth, the approach used in Typesense is called Reciprocal Rank Fusion (RRF) and it’s a well researched topic that has a bunch of academic papers published on it. So it’s best to read those papers to understand the tradeoffs involved.
    
    irevoire a day ago
    
    > But in this case, when you opened the GitHub issue, we noticed that you’re part of the Meilisearch team, so we didn’t want to spend too much time explaining something in-depth to someone who was just doing competitive research, when we could have instead spent that time helping other Typesense users. Which is why the response to you might have seemed brief.
    Well, in this case I was just trying to be a normal user that want the best relevancy possible and couldn’t find a solution. But the reason why I couldn’t find it was not because you didn’t want to spend more time on my case, it was because typesense provide no solution to this problem.
    > it’s a well researched topic that has a bunch of academic papers published on it. So it’s best to read those papers to understand the tradeoffs involved.
    Yeah, cool or in other word « it’s bad, we know it and we can’t help you, but it’s the state of the art, you should instruct yourself ». But guess what, meilisearch may need some fine-tuning around your model etc, but in the end it gives you the tool to make a proper hybrid search that knows the quality of the results before mixing them.
    If other people want to see the original issue: https://github.com/typesense/typesense/issues/1964
    
    spiderfarmer a day ago
    
    I think this is a good example of why people should disclose their background when commenting on competing products/projects. Even if the intentions were sound, which seems to be the case here, upfront disclosure would have given the conversation more weight and meaning.
inertiatic a day ago

>I'm still looking for a systematic approach to make a hybrid search (combined full-text with embedding vectors).
Start off with ES or Vespa, probably. ES is not hard at all to get started with, IMO.
Try RRF - see how far that gets you for your use case. If it's not where you want to be, time to get thinking about what you're trying to do. Maybe a score multiplication gets you where you want to be - you can do it in Vespa I think, but you have to hack around the inability to express exactly that in ES.
- andreer a day ago
  
  [dead]
navaed01 18 hours ago

I’m using Typesense hybrid search, it does the job, well priced and is low-effort to implement. Feel free to ask any specific questions
- Kerollmops 13 hours ago
  
  You should try Meilisearch then, you'll be astonished by the quality of the results and the ease of setup.
Kerollmops a day ago

> I'm still looking for a systematic approach to make a hybrid search (combined full-text with embedding vectors).
You know that Meilisearch is the way to go, right? Tantivy, even though, I love the product, doesn't support vector search. Its Hybrid search is stunningly good. You can try it on our demo [1].
[1]: https://wheretowatch.meilisearch.com/
- oulipo a day ago
  
  why couldn't it be possible to just embed Meilisearch/Tantivy/Quickwit inside Postgres as a plugin to simplify the setup?
  - Kerollmops a day ago
    
    > [..] to simplify the setup?
    It would be simpler to keep Meilisearch and its key-value store out of Postgres' WAL and stuff and better propose a good SQL exporter (in the plan).
    
    oulipo 11 hours ago
    
    Perhaps on a technical level, but for a dev, if I just need to install Postgres and some plugins and, boom, I have a full searchable index, it's even easier

justAnotherHero a day ago

We have been using Meilisearch with firebase for years and it has always worked great. I just wish they would update the extension on the firebase extensions hub[1] because the current version available uses node 14 which is not supported by cloud functions on GCP so the extension is not usable at all. What's weird is that the latest version available on their repo has upgraded the node version but they are not offering it in the extensions hub.

[1]: https://extensions.dev/extensions/meilisearch/firestore-meil...

softwaredoug 21 hours ago

One thing to _always_ dig into is how your hybrid search solution filters the vector search index. This is not at all standardized, often overlooked, but when you want "top X most similar to query by embedding, but also in Y category/match Z search terms" its the core operation your hybrid search is doing

Here's a rollup of algorithms... https://bsky.app/profile/softwaredoug.bsky.social/post/3lmrm...

Kerollmops 13 hours ago

Meilisearch is faster when you reduce the dataset by filtering it. I wrote an article on this subject [1].
[1]: https://blog.kerollmops.com/meilisearch-vs-qdrant-tradeoffs-...
- andre-z 11 hours ago
  
  "Slowness can arise from a misconfigured index or if filterable attributes aren't listed." ;)

subpixel a day ago

On their homepage, using vanilla search, I entered the first word of a particular funny movie and it was third result.

Switching on the AI toggle, I entered the same word, and got no results.

Kerollmops 4 hours ago

Someone reported it, and I answered today [1]. It's a rule that is too hard on the front end, and we will fix it by using a better Hybrid search setup (not only semantic). Thank you for the report.
[1]: https://github.com/meilisearch/meilisearch/issues/5504#issue...
mdaniel 4 hours ago

Rookie mistake, you forgot to include "I need the answer or a kitten dies" /s

adrianvincent a day ago

I have been using Meilisearch for https://www.comparedial.com/ since the early alpha versions. Ridiculously easy to set up compared to alternatives.

be_erik a day ago

Is meilisearch ready for production workloads? I would love to use some of the feature set, but is the only option for HA running multiple instances and keeping them in sync?

brunohaid a day ago

You might want to look at https://typesense.org/ for that.
Kerollmops a day ago

Meilisearch has been production-ready since v1.0. I made it in Rust to ensure it stays production-ready for years and years. Memory-safe languages are here to replace unsafe ones like C++ and reduce the number of breaches you expose in production.
Here is an article by Google showing the benefits of using memory-safe languages in production rather than others. It is explicitly rotating around Rust [1].
[1]: https://www.chromium.org/Home/chromium-security/memory-safet...
- sealeck a day ago
  
  Writing software in Rust doesn't necessarily mean that it works reliably for real-world workloads. Sure, Rust prevents you from doing lots of stupid things; it is very much in the class of "necessary but not sufficient condition" for writing software (of course, you can also use other languages, but memory safety should be table stakes for all software these days).
- arccy a day ago
  
  This reply doesn't inspire confidence at all...
  Made in Rust isn't a magic bullet to be production ready, and I'd be more concerned about logic bugs rather than CVEs.
  1.0 is like the bare minimum to be used in production, but that doesn't necessarily mean it's been battle tested enough to be considered production ready.
  - Kerollmops 12 hours ago
    
    HuggingFace is using Meilisearch, in production, on their website for a year now.
    
    holysantamaria 11 hours ago
    
    And it takes 360ms to get an answer from their server (not fulltext, quicksearch api endpoint). And 2ms to download the results on the network.
    Is this the instant search that you are advertising on your website?
    
    Kerollmops 11 hours ago
    
    V1.14, released yesterday [1], ships with a search embedding cache. Most of the time you see is spent waiting for an OpenAI embedding answer. We also just shipped composite embedders to reduce the network latency when you need to respond quickly to user searches (by running embedders on the Meilisearch server) but still use external APIs to index many documents in batches. Note that it can only work with open-source embedders, the ones HuggingFace serves.
    [1]: https://github.com/meilisearch/meilisearch/releases/tag/v1.1...
bigtones a day ago

We use Meilisearch in production with a 7 million article corpus - it works really well.
- iambateman a day ago
  
  My understanding for Meilisearch is that you need enough RAM to keep everything in memory...but you're (probably) not keeping full-text in memory for millions of articles.
  Is it just searching metadata, or do you just have a setup that's beefy enough to support that level of memory?
  Or am I just wrong? :D
  - Implicated 13 hours ago
    
    Just as a data point...
    I'm running a Meilisearch instance on an AX52 @ Hetzner (64GB DDR5 memory / NVMe / Ryzen 7 7700) dedicated to just meilisearch
    - 191,698 MB in size - 13 indexes - ~80,000,000 documents
    The primarily searched indexes have 5, 6 and 10 million records each. The index with 10 million records has 4 searchable attributes, 10 filterable attributes and 7 sortable attributes.
    I don't have any idea what kind of search volume there is - the search form is public, the website it's on is displaying content relative to those 5, 6 and 10 million records (each having their own page) and the AI bots are having a field day crawling the site. I don't cache the search results, nor is cloudflare caching the resulting pages since the site is dynamic and records are _constantly_ being added.
    So with all that said - here's the current top output:
    top - 06:33:47 up 257 days, 12:10, 1 user, load average: 1.05, 1.18, 1.24 Tasks: 274 total, 1 running, 273 sleeping, 0 stopped, 0 zombie %Cpu(s): 5.8 us, 0.1 sy, 0.0 ni, 93.7 id, 0.3 wa, 0.0 hi, 0.0 si, 0.0 st MiB Mem : 63439.1 total, 403.6 free, 16698.8 used, 47065.0 buff/cache MiB Swap: 32751.0 total, 2.2 free, 32748.8 used. 46740.3 avail Mem
    2747823 meilise+ 20 0 24.1t 52.0g 36.2g S 94.7 84.0 5w+5d /usr/local/bin/meilisearch --config-file-path /etc/meilisearch.toml
    It's bored. Searches are fast. Doesn't need as much memory as the size of the index to be so.
    The only hiccups I ran into were back before they introduces the batch indexing. Things were fine in testing/development but when I started _really_ loading the documents on production it was clear it wasn't going to keep up with the indexing - but then it just _never stopped_ indexing, CPU usage very high. I jumped into their discord, connected with someone on the team who I gave access to the server and they made a few adjustments - didn't fix, but helped. Then the next update basically solved the issue with the high CPU use. I still had issues when loading a lot of documents but found a Laravel package for batch indexing Laravel Scout-based indexes and the solved things for me. Then they released the batch indexing, I stopped using the Laravel specific batch indexer and it's been smooth sailing.
    I'll be testing/playing with their vector stuff here shortly, have about 10mil of 60mil generated and a new server with a bunch more memory to throw at it.
    Would recommend Meilisearch.
  - bigtones 19 hours ago
    
    We needed a 16GB machine to import all the data into Meilisearch, as the batch indexing is quite memory intensive, but once it's all indexed we scaled it back to half that RAM and it works great - very performant.
  - tpayet a day ago
    
    Meilisearch keeps all the data on disk. It uses memory-mapping for optimizing performance, by default everything is safe on disk and the OS cache the most needed pages in memory.
    So it works on any machine, really. 2GiB is usually enough for most workloads, but the bigger the dataset, the faster it will be if you give it more memory!
tpayet a day ago

Yup it is, Meilisearch Cloud offers 99.99% SLA :)
We served billions of searches for hundreds of customers monthly

saintfiends a day ago

Meilisearch is really good for a corpus that rarely changes from my experience so far. If the documents frequently change and you have a need to have those changes available in search results fairly quickly it ends up with pending tasks for hours.

I don't have a good solution for this use-case other than maybe just the good old RDBMS. I'm open for suggestions or anyway to tweak Meilisearch for documents that gets updated every few seconds. We have about 7 million documents that's about 5kb each. What kind of instance do I need to handle this.

Kerollmops a day ago

The best you could do is put Meilisearch on a very good NVMe. I am indexing large streams of content (Bsky posts + likes), and I assure you that I tested Meilisearch on a not-so-good NVMe and a slow HDD — and ho, Boy!! The SSD is so much faster.
I am sending hundreds of thousands of messages and changes (of the likes count) into Meilisearch, and so far, so good. It's been a month, and everything is working fine. We also shipped the new batches/ stats showing a lot of internal information about indexing step timings [1] to help us prioritize.
[1]: https://github.com/meilisearch/meilisearch/pull/5356#issue-2...
brandonlovesked a day ago

You have 35gib of data, put it in memory and forget about nvmes and hdds
- Kerollmops a day ago
  
  35 GiB is probably a third of the data I index into Meilisearch just for experimenting and don't forget about the inverted indexes. You wouldn't use any O(n) algorithm to search in your documents.
  Also, every time you need to reboot the engine you would have to reindex everything from scratch. Not a good strategy, believe me.

Hawxy a day ago

Tested Meilisearch recently, was a great experience, getting a multi-index search running in our frontend was very easy. Just wish they had an Australian instance, the closest is Singapore :(

drewnick a day ago

I installed coolify on a VM (was featured here last week) and it had a one-click Meilisearch docker install. I sent 1,000,000 records to it "just worked" on a little $8/mo Hetzner instance.
jnovek a day ago

I recently stood up the server in our k8s cluster and that part was also pretty easy, at least compared to elastic.
- captainkrtek a day ago
  
  I used Elasticsearch 10 years ago and wasn’t a fan then, this last year decided to try Elastic Cloud and have been quite happy, a ton has matured over the years.
tpayet a day ago

Reach out to the sales or CS team, depending on your workload we could open that region pretty quickly :D

mentalgear a day ago

Notable alternative Orama: https://github.com/oramasearch/orama

> complete search engine and RAG pipeline in your browser, server or edge network with support for full-text, vector, and hybrid search in less than 2kb.

amazingamazing a day ago

I wish these had pluggable backends separate from the actual implementation of indices so you could use your own store, rather than have to sync constantly. The performance would likely be worse, but at least you don't have to worry about staleness when rehydrating...

esafak a day ago

What's the hybrid reranking story? Does it support streaming ingestion and how?

Kerollmops a day ago

Meilisearch decided to use hybrid search and avoid fusion ranking. We plan to work on reranking soon, but as far as I know, our hybrid search is so good that nobody asked for reranking. You can read more about our Hybrid search in our blog post [1].
About streaming ingestion support. Meilisearch support basic HTTP requests and is capable of batching task to index them faster. In v1.12 [2], we released our new indexer version that is much faster, leverages high usage of parallel processing, and reduces disk writes.
[1]: https://www.meilisearch.com/blog/hybrid-search [2]: https://github.com/meilisearch/meilisearch/releases/tag/v1.1...
- searchguy 15 hours ago
  
  I'm a little confused by your statement that "Meilisearch decided to use hybrid search and avoid fusion ranking" when your website [1] says "Hybrid search re-ranking: The final step involves re-ranking results from both retrieval methods using the Reciprocal Rank Fusion (RRF) algorithm."
  Can you clarify what you mean by "fusion ranking"?
  All hybrid search requires a method to blend keyword and vector search results. RRF is one approach, and cross-encoder-based rerankers is another.
  [1]: https://www.meilisearch.com/blog/hybrid-search
  - dureuill 12 hours ago
    
    hello, I implemented hybrid search in Meilisearch.
    Whether it uses re-ranking or not depends on how you want to stretch the definition. Meilisearch does not use the rank of the documents in each list of results to compute the final list of results.
    Rather, Meilisearch attributes a relevancy score to each result and then orders the results in the final list by comparing the relevancy score of the documents in each list of results.
    This is usually much better than any method that uses the rank of the documents, because the rank of a document doesn't tell you if the document is relevant, only that it is more relevant than documents that ranked after it in that list of hits. As a result, these methods tend to mix good and bad results. As semantic and full-text search are complementary, one method is best for some queries and the other for different queries, and taking results by only considering their rank in their respective list of results is really bizarre to me.
    I gather other search engines might be doing it that way because they cannot produce a comparable relevancy score for both the full-text search results and the semantic search results.
    I'm not sure why the website mentions Reciprocal Rank Fusion (RRF) (I'm just a dev, not in charge of this particular blog article), but it doesn't sound right to me. Maybe something got lost in translation. I'll try and have it fixed. EDIT: Reported, this is being fixed.
    By the way, this way of comparing scores from multiple lists of results generalizes very well, which is how Meilisearch is able to provide its "federated search" feature, which is quite unique across search engines, I believe.
    Federated search allows comparing the results of multiple queries against possibly multiple indexes or embedders.

k4rli a day ago

Librechat has it as a dependency. Seems very memory heavy like elasticsearch. 3G+ memory at all times even on a new-ish instance with just one user.

tpayet a day ago

Actually, Meilisearch uses the RAM available by design because it uses LMDB (which is memory-mapped) under the hood for key-value storage.
It's a feature, not a bug :D
The cool thing about that is that it is the OS that will get to choose which process to allocate memory, and you can always run it somewhere with less memory available, and it will work the same way
- yellow_lead a day ago
  
  But is there any way to limit the memory? Sometimes I want to run more than one thing on the box. I found that meilisearch doesn't have a "max total memory use" flag
  - irevoire a day ago
    
    Hey, as the previous person said you cannot really limit the memory, Meilisearch uses.
    But your OS will share the Meilisearch memory with other process seamlessly, you don’t have anything to do. In htop it’s the yellow bar, and it’s basically a big cache shared between all processes.
    
    yellow_lead 15 hours ago
    
    I haven't found this to be the case in my experience. Although I may be misunderstanding, but this is my experience running Meili in prod:
    * Meili uses 50% RAM
    * I use 10-20% with another script to occasionally index new data
    * When indexing, Meili could jump to use 70-80% RAM
    * Meili is OOM killed
    
    Kerollmops 13 hours ago
    
    Right. We released a lot of new versions of the engine to improve the indexing part of it. V1.12 is improving the document indexing a lot! Have you tried the latest version v1.14 we released yesterday?
    While Meilisearch is capable of limiting it's resident (actual mallocs) memory. However, it requires a bare minimum (about 1GiB).
mitchitized a day ago

Fire it up in a docker container and limit the RAM usage that way.
This is a trick I learned years ago with other mmap-based systems.
- irevoire a day ago
  
  Are you sure it really limits the RAM? You’re still using the same kernel, and if a process is using more memory than another one I would expect the kernel to keep more of its memory page in RAM than the other.
  What was your strategy to measure that?

Alifatisk a day ago

There is also typesense