Showing results for 
Search instead for 
Do you mean 
Reply
Regular
Posts: 11
Registered: ‎01-17-2016
1

problem with indexing pdf's on a large database

Hi,

 

I have a large database (around 100Gbytes).

 

I've noticed that newly imported pdf's are not searchable. 

Upon looking in the endnote directory I noticed that the pdf.index.MYD file is 4Gbytes long. 

I use a ntsf formatted disk in windows 7 and 10. Max file size is way beyond this number.

 

What is the problem?

What can I do to alleviate this?

What Thomson Reuters do to correct this?

 

I hope someone can answer this,

 

Nick

Mentor
Posts: 2,087
Registered: ‎09-30-2009
1

Re: problem with indexing pdf's on a large database

[ Edited ]

Could you provide more information?

 

  1. How were the PDFs imported into Endnote (e.g., Auto Import, .xml file, etc.)? Did you try importing the PDFs in smaller groups batches of files instead of one large group?  (Importing them  in multiple groups of may also help  identify problematic/corrupted files  which may be impeding the import process.)  If you have PDFs  with long filenames try shortening the file name.
  2. For the PDFs that you say aren't searchable:
    a.  Did you include PDF or the Any Field + PDF from the EndNote field terms (click the pulldown menu) as the search parameters?
    b.  Did you check to see if the PDFs are text-based or are they scanned images (essentially a picture of the document so there's no text that Endnote can read/search)? You can check this by taking one of the problematic PDFs and see if you can either: 1) copy and paste the scanned text into MS Word; 2) use the Adobe Reader search box to test whether searches can be performed within the PDF.

    If you're dealing with the scanned images instead of searchable text PDF documents you still might be able to convert the scanned image using OCR text recognition available only in the full version of Adobe Acrobat .
Regular
Posts: 11
Registered: ‎01-17-2016
1

Re: problem with indexing pdf's on a large database

Hi,

 

I imported the pdf's as usual: as a file or as a folder.

Anyway, regarding the pdf's themselves. I tested it with my own thesis. I imported it years ago and it will show up when I search for some words. The newly imported pdf (with a different name) does not show up. 

When I make a new library, there is no problem. But I want to use only 1 library with my books and papers. The library consists of around 15000 papers and 8000 books.

 

So there is no problem with the said pdf's.

 

Problem is somewhere else - namely with the previous mentioned file. It is apparantly maxed out. This is clearly a software bug. The programmers used a file definition from the old dos era: fat32. Those formats hold files with a max lenght of 4GB.

 

So can you please email this thread to the programmers and have them respond.

 

Thanks,

 

Nick 

Regular
Posts: 11
Registered: ‎01-17-2016
1

Re: problem with indexing pdf's on a large database

have a look at the following directory:

 

 

Regular
Posts: 11
Registered: ‎01-17-2016
1

Re: problem with indexing pdf's on a large database

above is from the rdb directory.

 

There is also a tdb directory with the same filenames:

Mentor
Posts: 2,087
Registered: ‎09-30-2009
1

Re: problem with indexing pdf's on a large database


nick-yellow wrote:

 

So can you please email this thread to the programmers and have them respond.

 


Suggest you notify  tech support as I don't work for Endnote/Thomson Reuters and have no contact with the programmers.

 

Technical support

For help using EndNote

Monday – Friday, 9 a.m. – 8 p.m. Eastern (GMT -5)
+1-800-336-4474, press 4

Submit a technical support request

International support contacts

Regular
Posts: 11
Registered: ‎01-17-2016
1

Re: problem with indexing pdf's on a large database

I just send them an email (inputting form) with a link to this threat.

 

I'll post updates into this threat when they solve it.

 

Cheers,

 

Nick

Regular
Posts: 11
Registered: ‎01-17-2016
1

Re: problem with indexing pdf's on a large database

Many debugging sessions have been performed on my database and now it's confirmed: Its a bug and a big one that is:

 

last email from them:

"Thank you very much for your continued patience.

I received a reply from our internal team. This issue has been raised as a bug in our internal system. Our developers are working towards a fix. Please note that I do not have a time frame when this will be fixed. That being said, as a work around you could split the library into parts and work with it.

I do apologise for any inconvenience caused."

 

I hope they will fix it soon since splitting my databse is not really an option since that will take a lot of time.

Mentor
Posts: 2,087
Registered: ‎09-30-2009
1

Re: problem with indexing pdf's on a large database

Sorry to hear the bug was confirmed but thanks for the update as I was wondering how things were progressing. You didn't mention what version of Endnote you're using, but did tech support indicate the bug is isolated to Endnote X7? 

 

In order to bypass importing PDFs into Endnote X7 am wondering whether using an earlier version of Endnote such as X6, would enable proper importing and storing of the PDFs. If so, then the Endnote X6 library with the PDFs could be exported as an .XML file which in turn would be imported (the .XML file) into Endnote X7. Don't know if you have the time or inclination to test it, but otherwise you might try asking tech support if they could perform the import for you or fix your database given that they can't give you a definite time frame for a fix.

Regular
Posts: 11
Registered: ‎01-17-2016
1

Re: problem with indexing pdf's on a large database

Hello,

 

Has the indexing problem been solved in the latest X8 version?

 

Nick