cancel
Showing results for 
Search instead for 
Did you mean: 
Read only

How can we automate data loading in datahub 1808 ?

Former Member
0 Likes
460

I wants to automate datahub loading , composition and publication process. I have my source system data into DB tables and I have written a cronjob which is preparing "rawFragments" for datahub by querying the DB tables. Now I wants to initiate the data loading process through java code to load the data into datahub rawitems. I have automated the composition and publication with the help of event implementation. So If I'm uploading my data in CSV format from datahub adapter view of backoffice, it's loading the data into rawitems and then automatically it's triggering to composition and later publication. But my requirement is I wants to automate this dataloading process as well. I wants to remove the dependency of manually uploading the data into CSV format from backoffice or by hitting datahub inbound.

I have my "rawFragments" ready with me, can any one please help me out with the internal datahub API which can be use to initiate the data loading process automatically via java code ?

Accepted Solutions (0)

Answers (1)

Answers (1)

Slava
Product and Topic Expert
Product and Topic Expert
0 Likes

DataHub uses Spring Integration channels to accept data loads. Overall description can be found in this Load and these code examples can help you write your own service for loading data.

Former Member
0 Likes

I am triggering data load using "RawFragmentDataInputChannel" just after datahub initialization. By implementing DataHubInitializationCompletedEvent, below is the code:

But I'm getting below error while it's trying to load the data, though I'm using DEFAULT_FEED for loading data:

Can you please help me out with this?

Slava
Product and Topic Expert
Product and Topic Expert
0 Likes

I need time to research this and don't have it now. Will come back to your question in a day or two unless somebody else solves your problem

Slava
Product and Topic Expert
Product and Topic Expert
0 Likes

Here is what I found. DEFAULT_FEED is also created in a listener for DataHubInitializationCompletedEvent, which means your listener is executed before the listener creating the feed is. Unfortunately there is no way to control the order, in which the listeners are executed. For that reason here is the list of ideas I came up with:

  • Put a some time delay in your listener, i.e. Thread.sleep(), in hope that while your thread is sleeping, other listeners are notified concurrently and the essential-data listener specifically finishes it's job;

  • Use feed other than DEFAULT_FEED, which you could create in your listener before dispatching the message;

  • Turn you listener into a timer initializer. I mean you will have a service that loads data, and a service that has a timer and calls you data load service periodically. The listener simply starts the timer service.

These seem to be the simplest solutions not requiring some deep DataHub hacking.

Former Member
0 Likes
  • Use feed other than DEFAULT_FEED, which you could create in your listener before dispatching the message;

This approach I have already tried but the result is same as DEFAULT_FEED.

  • Turn you listener into a timer initializer. I mean you will have a service that loads data, and a service that has a timer and calls you data load service periodically. The listener simply starts the timer service

. I also got the same idea and tried implementing quartz cron job which will periodically calls my rawFragments preparation service and loads the data. But I'm facing some spring dependency related issue with quartz cronjob. I'm getting nullpointer exception wherever I'm using spring dependency injections in my service, which is being called from cronjob. If you have any sample code for timer service implementation along with spring integration please share with me, it'll be very helpful.

Slava
Product and Topic Expert
Product and Topic Expert
0 Likes

I'm surprised the custom pool approach did not work. We used it for our sample even based auto-composition/publication, which you probably already have seen, if you implemented auto-composition for your needs. I don't see why you're getting null there because the listener runs in a single thread and sequential execution is guaranteed: create a pool and a data feed, use the feed to load data.

As to the problems with quartz, I don't have example. However, I would recommend to search Spring support forums as it's purely a Spring problem. I'm pretty sure there should be many example of using quartz in s Spring service.