— [EDM,GED,DMS] roadmap continue : warmup@2016, 2D indexing comparo with the heavyweight opensource solutions#lucene out-of-my-box@2017 —



| Documents [0.] | Images [1.] | Audios [2.] | Videos [3.]
00.txt 01.odt 02.doc 03.pdf 04.ods 05.xls 06.odp 07.ppt 08.odt 09.pdf 10.jpg 11.gif 12.png 20.mp3 21.wav 22.amr 30.avi 31.mp4 32.mkv

Milestones

`Context` data and cluster ready.

       
 


Index [myDoc*.*]... Search [BigData|Big*]...


| Documents [0.] | Images [1.] | Audios [2.] | Videos [3.]
00.txt 01.odt 02.doc 03.pdf 04.ods 05.xls 06.odp 07.ppt 08.odt 09.pdf 10.jpg 11.gif 12.png 20.mp3 21.wav 22.amr 30.avi 31.mp4 32.mkv
#3          09/20
#2          10/20
#1          12/20
Vertical limit Horizontal indexing end
 
Vertical and horizontal : 2 dimensions to improve the indexing surface

And the winner is...
    12/20                                                                                   
    10/20
    09/20           

  • SolR.
  • "http://jbd-vm01.jbdata.fr:8983/solr/myCollec-0/select?indent=on&q=Big*&fl=id,a_s,a_i,a_f&sort=a_f asc,a_i asc&rows=100&wt=json"
    {
    "responseHeader":{
    "zkConnected":true,
    "status":0,
    "QTime":7,
    "params":{
    "q":"Big*",
    "indent":"on",
    "fl":"id,a_s,a_i,a_f",
    "sort":"a_f asc,a_i asc",
    "rows":"100",
    "wt":"json"}},
    "response":{"numFound":9,"start":0,"docs":[
    {
    "id":".../dev/ged-06/input-20/myDoc-00.txt"},
    {
    "id":".../dev/ged-06/input-20/myDoc-01.odt"},
    {
    "id":".../dev/ged-06/input-20/myDoc-02.doc"},
    {
    "id":".../dev/ged-06/input-20/myDoc-03.pdf"},
    {
    "id":".../dev/ged-06/input-20/myDoc-04.ods"},
    {
    "id":".../dev/ged-06/input-20/myDoc-05.xls"},
    {
    "id":".../dev/ged-06/input-20/myDoc-06.odp"},
    {
    "id":".../dev/ged-06/input-20/myDoc-07.ppt"},
    {
    "id":".../dev/ged-06/input-20/myDoc-10.jpg"}]
    }}
    
  • ElasticSearch.
  • "http://jbd-vm01.jbdata.fr:9200/mydocs-idx/doc/_search?pretty" -d '{
    "query": {
    "bool": {
    "must": [
    {
    "match" : { "content" : "BigData" }
    }
    ],
    "must_not": [],
    "should": []
    }
    },
    "from": 0,
    "size": 50,
    "sort": [],
    "aggs": {}
    }'
    {
    "took" : 23,
    "timed_out" : false,
    "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
    },
    "hits" : {
    "total" : 10,
    ".../dev/ged-06/input-20/myDoc-07.ppt"
    ".../dev/ged-06/input-20/myDoc-01.odt"
    ".../dev/ged-06/input-20/myDoc-00.txt"
    ".../dev/ged-06/input-20/myDoc-10.jpg"
    ".../dev/ged-06/input-20/myDoc-05.xls"
    ".../dev/ged-06/input-20/myDoc-12.png"
    ".../dev/ged-06/input-20/myDoc-03.pdf"
    ".../dev/ged-06/input-20/myDoc-02.doc"
    ".../dev/ged-06/input-20/myDoc-06.odp"
    ".../dev/ged-06/input-20/myDoc-04.ods"
    
  • mySauce.
  • java org.apache.lucene.demo.SearchFiles -index .../dev/ged-06/.lucene -query "big*"
    Searching for: big*
    12 total matching documents
    1. .../dev/ged-06/output-20/myDoc-05.txt
    2. .../dev/ged-06/output-20/myDoc-03.txt
    3. .../dev/ged-06/output-20/myDoc-21.txt
    4. .../dev/ged-06/output-20/myDoc-20.txt
    5. .../dev/ged-06/output-20/myDoc-00.txt
    6. .../dev/ged-06/output-20/myDoc-07.txt
    7. .../dev/ged-06/output-20/myDoc-02.txt
    8. .../dev/ged-06/output-20/myDoc-04.txt
    9. .../dev/ged-06/output-20/myDoc-12.txt
    10. .../dev/ged-06/output-20/myDoc-06.txt
    Press (n)ext page, (q)uit or enter number to jump to a page.
    n
    11. .../dev/ged-06/output-20/myDoc-01.txt
    12. .../dev/ged-06/output-20/myDoc-10.txt
    

Tika 1.15 upgrade and tunning to increase indexing surface : ( vertical + 2 ) * ( horizontal - 1 ) = 13.


| Documents [0.] | Images [1.] | Audios [2.] | Videos [3.]
00.txt 01.odt 02.doc 03.pdf 04.ods 05.xls 06.odp 07.ppt 08.odt 09.pdf 10.jpg 11.gif 12.png 20.mp3 21.wav 22.amr 30.avi 31.mp4 32.mkv
#3 09/20
#2 10/20
#1 13/20

The inescapable lucene soon escapable with a grep ?

TODOs

EDMS#5@2016 : Warmup EDMS#6@2017 : Index EDMS#? : TODO