problem with indexing pdf's on a large database

Hi,

I have a large database (around 100Gbytes).

I’ve noticed that newly imported pdf’s are not searchable. 

Upon looking in the endnote directory I noticed that the pdf.index.MYD file is 4Gbytes long. 

I use a ntsf formatted disk in windows 7 and 10. Max file size is way beyond this number.

What is the problem?

What can I do to alleviate this?

What Thomson Reuters do to correct this?

I hope someone can answer this,

Nick

1 Like

Could you provide more information?

  1. How were the PDFs imported into Endnote (e.g., Auto Import, .xml file, etc.)? Did you try importing the PDFs in smaller groups batches of files instead of one large group?  (Importing them  in multiple groups of may also help  identify problematic/corrupted files  which may be impeding the import process.)  If you have PDFs  with long filenames try shortening the file name.
  2. For the PDFs that you say aren’t searchable:
    a.  Did you include PDF or the Any Field + PDF from the EndNote field terms (click the pulldown menu) as the search parameters?
    b.  Did you check to see if the PDFs are text-based or are they scanned images (essentially a picture of the document so there’s no text that Endnote can read/search)? You can check this by taking one of the problematic PDFs and see if you can either: 1) copy and paste the scanned text into MS Word; 2) use the Adobe Reader search box to test whether searches can be performed within the PDF.

If you’re dealing with the scanned images instead of searchable text PDF documents you still might be able to convert the scanned image using OCR text recognition available only in the full version of Adobe Acrobat .

1 Like

Hi,

I imported the pdf’s as usual: as a file or as a folder.

Anyway, regarding the pdf’s themselves. I tested it with my own thesis. I imported it years ago and it will show up when I search for some words. The newly imported pdf (with a different name) does not show up. 

When I make a new library, there is no problem. But I want to use only 1 library with my books and papers. The library consists of around 15000 papers and 8000 books.

So there is no problem with the said pdf’s.

Problem is somewhere else - namely with the previous mentioned file. It is apparantly maxed out. This is clearly a software bug. The programmers used a file definition from the old dos era: fat32. Those formats hold files with a max lenght of 4GB.

So can you please email this thread to the programmers and have them respond.

Thanks,

Nick 

1 Like

have a look at the following directory:

1 Like

above is from the rdb directory.

There is also a tdb directory with the same filenames:

1 Like

@nick_yellow wrote:

 

So can you please email this thread to the programmers and have them respond.

 

Suggest you notify  tech support as I don’t work for Endnote/Thomson Reuters and have no contact with the programmers.

Technical support

For help using EndNote

Monday – Friday, 9 a.m. – 8 p.m. Eastern (GMT -5)
+1-800-336-4474, press 4

Submit a technical support request

International support contacts

1 Like

I just send them an email (inputting form) with a link to this threat.

I’ll post updates into this threat when they solve it.

Cheers,

Nick

1 Like

Many debugging sessions have been performed on my database and now it’s confirmed: Its a bug and a big one that is:

last email from them:

"Thank you very much for your continued patience.

I received a reply from our internal team. This issue has been raised as a bug in our internal system. Our developers are working towards a fix. Please note that I do not have a time frame when this will be fixed. That being said, as a work around you could split the library into parts and work with it.

I do apologise for any inconvenience caused."

I hope they will fix it soon since splitting my databse is not really an option since that will take a lot of time.

1 Like

Sorry to hear the bug was confirmed but thanks for the update as I was wondering how things were progressing. You didn’t mention what version of Endnote you’re using, but did tech support indicate the bug is isolated to Endnote X7? 

In order to bypass importing PDFs into Endnote X7 am wondering whether using an earlier version of Endnote such as X6, would enable proper importing and storing of the PDFs. If so, then the Endnote X6 library with the PDFs could be exported as an .XML file which in turn would be imported (the .XML file) into Endnote X7. Don’t know if you have the time or inclination to test it, but otherwise you might try asking tech support if they could perform the import for you or fix your database given that they can’t give you a definite time frame for a fix.

1 Like

Hello,

Has the indexing problem been solved in the latest X8 version?

Nick

1 Like

There was an update to X7.7 to X7.7.1 which I think was to fix the PDF import problem.  I would assume that X8 would also have the bug fix, if it was a problem there too.  

Inporting the pdf as a ref is not a problem.

Indexing is.

I cannot search for keywords in these pdf’s

See post 1.

HI I also have this problem with X8. Can anyone tell me how to index 28k PDFs with X8? It comes an Endote Error Window when I re-index the PDF after restoring my DB. Please let me know if this problme is still not solved. Thanks

Does the newer X8 version handle big databses with regard to indexing pdf’s?

I raise this question again before buying X8. Is the problem solved???

Does the newer X8 version handle big databses with regard to indexing pdf’s?

I raise this question again before buying X8. Is the problem solved???

Does the newer X9 version handle big databses with regard to indexing pdf’s?

I raise this question again before buying X9. Is the problem solved???