X4 Pdf import

I have th new X4 but when I move pdfs into the .data file and then import them it is not populating the rest of the fields. It imports the pdf and then places the name of the pdf file into the title field which is not how I thought this would work. Have I missed anything???

I am not sure you should first import them into the data file.  Just keep them on a temporary folder on your desk top. The PDFs need to be relatively new, and have the meta-data as a part of the file structure, for endnote to be able to parse the information into a record.  Older PDFs unfortuantately don’t have the information “available” and obviously any PDF that was produced by scanning a document wouldn’t either. 

Leanne is correct - just use the import function to try to import PDFs from an existing location/folder. NEVER put files directly into an EndNote library Data folder. The Data folder is designed only for use/access directly by the EndNote application.

Jason Rollins, the EndNote team

I just tried this with a new article (2010) from an OUP website and this method still did not work. Is there anyway to detect which articles have metadata embedded?

@jesscook wrote:
I just tried this with a new article (2010) from an OUP website and this method still did not work. Is there anyway to detect which articles have metadata embedded?

I’m having the same problem Jess is experiencing. Any solutions? It is a major problem because this is the feature that made buy EndNote X4!

of 722 pdfs imported, only 299 were able to generate proper references…  The dates of the publications range from 2000-2010. 

After poking around at the results for a bit… it seems that  this features work by searching the actual document for the string ‘DOI’.  If so, some older pdf’s will contain: ‘Digital Object Identifier’, and will not be found.  Searching for a specific string would explain the variability across publication year and publisher that I found

Multiple DOI detection seems to work: In a small number of cases (6/722) multiple DOIs were detected, and the filename rather than the actual title was inserted into the title field (as per the X4 help).

In some cases, the DOI is detected, but nothing is entered into the fields - in one case, this was because the DOI changed at some time after the PDF was created… that I can understand.

Can someone confirm my theory, so we can have a better idea of the limitations of the pdf import function?

thanks!

nancy

Your description is basically right - as it says in the EndNote X4 help file:

“Importing PDF Files to Create New References -This feature allows you to convert existing collections of PDF files into EndNote references with minimal typing and copying by extracting Digital Object identifiers (DOI) from PDF files. The system matches DOI information with data available from CrossRef  (www.CrossRef.org) by capturing bibliographic content and creating new EndNote references.”

Jason Rollins, the EndNote team

Would it be possible to have EndNote search for the string “Digital Object Identifier” in addition to “DOI” in a future update? This would seem to address the problem.

Pardon my ignorance if I failed to fully understand the issue.

Yeah I just confirmed that the document HAS to have the string “DOI” in it.  Recent pdfs from one published (doi printed but not preceded by “DOI”) don’t work, but those from another do (“DOI” precedes the actual DOI)

Could someone gove me an example of a pdf that *does* get imported correctly? I have tried 5 different vendors and none of the seem to work.

Linda

I have had ‘import pdf’ work with some pdfs and not with others.  (if “Ibm” wants an example, i cannot upload a pdf file in this forum. But check this reference, which is open-acces:(am giving DOI–put it in google search)

10.1073/pnas.262413599

But I want to flag another issue with PDF import: it seems that it takes the pdf file, parses it, and then copies it into the .data folder. There is no option to leave the PDF file where it is. Since this option is provided in the “File attachment dialogue”, it should also be provided in the PDF import, should it not?

Hello - This particular PDF includes the DOI as part of the URL and not as it’s own field that we would dinstinctly detect. Also, the DOI is not in the PDF metadata which would be another source for detection. We can investigate ways to finetune this in the future.

Regarding the .Data folder import, we are looking to improve how the actual PDF attachment works.

  • Mathilda, the EndNote team

Hi, I’m having a similar problem to JessCook: when I import pdf’s, even new ones, none of the reference fields are populated. The only thing that happens is the name of the pdf file is placed into the title field. Is this a bug that needs to be fixed?

I just imported a folder with 625 pdfs and not a single pdf has pulled in the author, year, journal, etc. Although some of these are scanned files, most are not. In fact, the only field that has been filled in, other than the title, for some of these, is the doi! For example, for one pdf, it put the correct doi in the doi entry area: 10.1093/deafed/enn011. I checked and this article IS in Crossreg.org. But still - nothing. Like another poster said, this is the reason I bought the upgrade.

The connection to http://www.crossref.org/ that EndNote relies on is occasionally a bit temperamental. I just did the following tests with this doi:10.1093/deafed/enn011 and had success with all of them.

  • I searched crossref using the crossref connection file in EndNote X4 - it found the record

  • I downloaded the free PDF of this article from PubMed Central and imported this PDF into EndNote - this successfully created a new EndNote record with the appropriate fielded data and attached the PDF.

  • I created a PDF that contained some text and the string doi:10.1093/deafed/enn011 - this also successfully created a new EndNote record with the appropriate fielded data and attached the PDF.

Based on this, I would suggest trying the import again as the CrossRef service might have been temporarily down.

Also, feel free to send  me a few sample PDFs and we can look into these to see if there are issues that we can tweak in EndNote.

The upcoming EndNote X4.02 update will include some additional improvements to the PDF importing feature.

Jason Rollins, the EndNote team

jason [dot] rollins [@] thomsonreuters [dot] com

I am still unsuccessful in getting doi: 10.1093/deafed/enn011 to load in the any information. Just in case it was just my download, I also redownloaded the file from the open access source. However, when I import it (File-Import-File, with settings as: Import option - pdf, Duplicates - import all, Text translation - no translation), I get an entry with only the file name as the title (J. Deaf Stud. Deaf Educ.-2009-DesJardin-22-43[1].pdf), the poi under poi (10.1093/deafed/enn011), and the pdf attached under file attachments. Nothing else there -  no author, journal, etc. What step am I missing??

I finally got this working. Not sure if it was the newest update (X4.0.2) or what happened. But, it’s working - about 1/7th of my articles worked. Not as great as expected, but it’s something!

Do the other PDF articles that you tried to import all contain DOIs that can be correctly resolved against PubMed or CrossRef? If yes, and these are still not importing into EndNote, please send me a sample and we will investigate.

Jason Rollins, the EndNote team

jason [dot] rollins [@] thomsonreuters [dot] com

Jason, I expect that most of them don’t. Some of the articles are scanned articles. For the others, I’ve found that many don’t have the doi until the last page, don’t have one at all (these articles span decades), or don’t have have the doi before the number. So, I wasn’t terribly surprised that so few imported. Hopefully in the future journals will start putting the doi earlier on and clearly marked!

I’ve been using X4 now for a couple weeks on the Mac. As far as I can tell, the PDF import function is an epic fail. Not one of the PDFs I tried has worked and each has given the results others have reported; specifically, the name of the PDF in the title field and the DOI field set correctly. Obviously, X4 is reading the DOI. I took the DOI that was generated in the reference and plugged it in to CrossRef.org and it came up accurately. It is not the case that CrossRef was down. There is no X4.0.2 update for the Mac yet. Has that update helped on the Windows side? In any case, this “feature” that was touted as a major reason to upgrade seems much closer to vaporware than anything useful.