Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using fragment on xml documents #34

Open
rohith004 opened this issue Mar 17, 2016 · 0 comments
Open

Using fragment on xml documents #34

rohith004 opened this issue Mar 17, 2016 · 0 comments

Comments

@rohith004
Copy link

I am trying to parse http://exporter.nih.gov/XMLData/final/RePORTER_PRJABS_X_FY2016_024.zip which contains one xml file with "PROJECTS" as the root.
i want to index multiple documents from this xml file with /PROJECTS/row as the new root
and /PROJECTS/row/APPLICATION_ID as one of the field.
my extractor.xml file looks like this
<document url=".*RePORTER.*" engine="xpath"> <fragment root="/PROJECTS/row"> <extract-to field="url"> <text> <expr value="//APPLICATION_ID" /> </text> </extract-to> <extract-to field="appl_no"> <text> <expr value="//APPLICATION_ID" /> </text> </extract-to> <extract-to field="doc_ref"> <constant value="some constant" /> </extract-to> </fragment> </document>
This fails always trying to index leaving the logs
Parser

2016-03-17 21:49:43,545 INFO parse.ParseSegment - Parsed (5ms):9095546 9024405 ..and so on
2016-03-17 21:49:43,546 INFO parse.ParseSegment - Parsed (1ms):file:/home/ubuntu/nih/RePORTER_PRJ_X_FY2016_024.xml

Indexer

org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://localhost:8983/solr/nih_core: ERROR: [doc=9095546 9024405..] multiple values encountered for non multiValued field doc_ref: [some constant, some constant...]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant