Difference between revisions of "DSpace/C2/Batch-import-of-items/English"

From Script | Spoken-Tutorial
Jump to: navigation, search
(Created page with "'''Script : Batch import of Items''' '''Author : Pankaj Patil''' '''Keywords: SAFBuilder, Batch Import, Batch Import in DSpace, Item Submission, Dublin Core, Mapfile, Metada...")
 
 
Line 76: Line 76:
 
|| Slide: '''Simple Archive Format'''
 
|| Slide: '''Simple Archive Format'''
 
|| Let us see the structure for '''Simple Archive Format'''.
 
|| Let us see the structure for '''Simple Archive Format'''.
* The '''archive''' contains one '''subdirectory''' per''' Item'''.
+
The '''archive''' contains one '''subdirectory''' per''' Item'''.
* Each '''subdirectory''' contains # '''dublin_core.xml '''file i.e. the '''Item's metadata'''
+
* Each '''subdirectory''' contains '''dublin_core.xml '''file i.e. the '''Item's metadata'''
 
* '''Files''' that come along with the '''Item '''and
 
* '''Files''' that come along with the '''Item '''and
 
* The '''Contents '''file, which consists of a list of filenames.  
 
* The '''Contents '''file, which consists of a list of filenames.  
Line 270: Line 270:
 
Email:''' dspace.u1@gmail.com '''
 
Email:''' dspace.u1@gmail.com '''
 
Password: '''u1pass'''
 
Password: '''u1pass'''
|| Login to '''DSpace '''with your '''administrator authority'''.
+
|| '''Login''' to '''DSpace '''with your '''administrator authority'''.
I will login with''' '''my '''administrator authority.'''
+
I will '''login''' with my '''administrator authority.'''
 
|-  
 
|-  
 
|| Click on the '''logged in''' tab
 
|| Click on the '''logged in''' tab
Line 310: Line 310:
 
|-  
 
|-  
 
|| Select the '''Articles Collection '''
 
|| Select the '''Articles Collection '''
|| From the list, select the '''Collection '''into which we want to upload the</span>''' Items.'''</span>
+
|| From the list, select the '''Collection '''into which we want to upload the ''' Items.'''
 
I will select '''Articles.'''
 
I will select '''Articles.'''
 
|-  
 
|-  
Line 321: Line 321:
 
|-  
 
|-  
 
|| Point to notification
 
|| Point to notification
|| A message is displayed, “'''The job was taken over, an email will be sent as soon as it’s finished'''”.
+
|| A message is displayed, “'''The job was taken over, an email will be sent as soon as it is finished'''”.
 
|-  
 
|-  
 
|| Only Narration
 
|| Only Narration
Line 417: Line 417:
 
|-  
 
|-  
 
|| Narration Only
 
|| Narration Only
|| So now we have verified the email notification of successful completion of '''batch import'''.
+
|| So now we have verified the '''email''' notification of successful completion of '''batch import'''.
 
|-  
 
|-  
 
|| Switch back to '''DSpace'''
 
|| Switch back to '''DSpace'''

Latest revision as of 03:13, 11 September 2020

Script : Batch import of Items

Author : Pankaj Patil

Keywords: SAFBuilder, Batch Import, Batch Import in DSpace, Item Submission, Dublin Core, Mapfile, Metadata Spreadsheet, Mapfile, Batch Job, Spoken Tutorial, Video Tutorial

Visual Cue Narration
Slide: Title Welcome to this spoken tutorial on Batch import of Items.
Slide: Learning Objectives In this tutorial, we will learn to
  • Set up SAFBuilder
  • Create a Simple Archive Format and
  • Do a batch import of Items into a Collection
Slide: System requirements This tutorial is recorded using
  • Ubuntu Linux OS 18.04
  • DSpace version 6.3
  • Firefox web browser and
  • Gedit Text Editor

However you may use any other web browser and text Editor of your choice.

Slide: Pre-requisites To practice this tutorial, you should have
  • A working internet connection
  • Installed DSpace 6.3 on your system
  • Tomcat service should be running
Slide: Pre-requisites
  • Have Administrator’s authority in DSpace and
  • Created a Collection in the Community
  • If not then please go through the prerequisite tutorials on this website.
Slide: Pre-requisites To follow this tutorial, you should have
  • Knowledge of Library Science and
  • Familiarity with Dublin Core metadata standards
Slide : Code files
  • The files used in this tutorial are available in the Code Files link on this tutorial page.
  • Please download and extract the files before practicing.
Slide: Batch Import Feature
  • Batch Import feature facilitates the upload of a large number of Items, simultaneously.
  • DSpace administrator is authorized to use the Batch Import feature.
Slide: Batch Import
  • Items uploaded using Batch Import are directly archived.
Slide: Batch Import Methodology
  • Basically, metadata for each Item is entered in a spreadsheet along with a filename.
  • Then we have to compress the spreadsheet and Item files into a zip file using SAFBuilder.
  • Using administrator authority, we have to upload the zip file.
Slide: Batch Import using SAFBuilder
  • To import a batch of Items, DSpace requires Items in the Simple Archive Format i.e. SAF.
  • SAFBuilder is the tool that is used to create the Simple Archive Format.
Slide: Setting up SAFBuilder
  • SAFBuilder is an add-on tool, which can be set up for any Ubuntu Linux user.
  • The manual to set up SAFBuilder on Windows is provided as Additional Reading Material on this tutorial page.
Slide: Simple Archive Format Let us see the structure for Simple Archive Format.

The archive contains one subdirectory per Item.

  • Each subdirectory contains dublin_core.xml file i.e. the Item's metadata
  • Files that come along with the Item and
  • The Contents file, which consists of a list of filenames.
Narration Only Let us proceed to set up the SAFBuilder tool.
Press Ctrl+Alt+T keys Open the terminal by pressing Ctrl + Alt + T keys simultaneously on the keyboard.

Ensure that you have root permissions to run the commands.

Only Narration Here onwards please press the Enter key after typing each command.
Highlight user spoken I will set up the SAFBuilder tool on the user spoken on my machine.
Only Narration git, jdk and maven tools are used to download and compile SAFBuilder, respectively.

These are already installed during DSpace installation.

[Terminal]: Type

git clone https://github.com/DSpace-Labs/SAFBuilder.git

Type the following command to download SAFBuilder from DSpace-Labs repository in github.

The SAFBuilder download may take some time depending on your internet speed.


If the SAFBuilder download fails, then recheck your internet connection or try after some time. SAFBuilder download is now completed on my machine.

[Terminal]: Type

cd SAFBuilder

Now type cd SAFBuilder to change the current directory to SAFBuilder directory.
[Terminal]: Type

./safbuilder.sh

Then type ./safbuilder.sh to compile SAFBuilder.

Compilation of SAFBuilder has started and it may take some time to complete. If SAFBuilder compilation fails, then recheck your internet connection or try after some time.

Narration only The SAFBuilder compilation is now successful.
Narration only Next, let us proceed to create a Simple Archive Format of different Items.
[Terminal]: Type

cd $HOME

Type cd $HOME, to change the current directory to Home directory.
[Terminal]: Type

mkdir ItemUpload

Now let us create a directory to store the metadata spreadsheet and all the files to be uploaded.

I will create a directory named ItemUpload. To do so, type mkdir ItemUpload.

Narration only I have downloaded an article and its metadata files in my Downloads folder.

These are provided in the Code Files link.

[Terminal]: Type

cd $HOME/Downloads

Using cd command we will switch to the Downloads directory.
[Terminal]: Type ls Type ls to check the contents.
Highlight Article1.pdf and Article2.pdf For this demonstration, we will use Article1.pdf and Article2.pdf files for the batch upload.
[Terminal]: Type

cp Article1.pdf Article2.pdf $HOME/ItemUpload

So, copy the files Article1.pdf and Article2.pdf to ItemUpload directory with this command.
Narration only Now, let us proceed to check the metadata spreadsheet file.
Text on screen: In Windows OS, select Open with Excel Right-click on Article1-2metadata.csv and select Open with LibreOffice Calc as shown here.
Point to dialog boxText Import The Text Import dialog box appears, which have some settings in which to import the text.
Click the button OK Keep the default settings and click on the OK button.
Point to Article1-2metadata.csv Article1-2metadata.csv opens as a spreadsheet.
Point to header row Observe that the first row is the header row.
Point to columns of header row Each column of the header row is a Dublin Core element.

It corresponds to each field in the Item Submission form.

Narration only It is mandatory to use header names strictly as provided in this metadata spreadsheet.
Point to column filename The first column is filename, which is used to write the name of the file to be uploaded.
Narration only The next columns sequentially represent each field in the Item Submission form.
Point to dc.contributor.author For example, the first field in Item submission form is Authors.

The corresponding column for Author in the spreadsheet is dc.contributor.author

Point to multiple columns of

dc.contributor.author

Author is a multi-value field in Item submission form.

So, multiple columns are represented as dc.contributor.author

Narration only The number of columns can be added or removed from multi-value fields in the Item Submission form.
Narration only Similarly, we have columns for other fields in the Item Submission form.
Highlight dc.title,dc.date.issued,dc.publisher,etc Single-value fields in the Item Submission form are represented using a single column.
Highlight dc.identifier.ismn, dc.identifier.issn, dc.identifier.isbn Multi-value fields in the Item Submission form are represented using multiple columns.
Point to row of Article1.pdf Each row in the spreadsheet represents a separate Item.
Point to row of Article1.pdf and Article2.pdf For each Item to be uploaded, write the filename and metadata as shown.
Point to Article1-2metadata.csv The metadata spreadsheet should be saved in CSV format only.
Close the Calc Close the Calc window now.
Switch back to terminal Switch back to the terminal.
[Terminal]: Type

cp Article1-2metadata.csv $HOME/ItemUpload

Now, copy the metadata spreadsheet file to the ItemUpload directory.

To do so, type the command as shown here.

[Terminal] : Type

cd $HOME/ItemUpload

Now let us check the contents of the ItemUpload directory.

Using cd command, change current working directory to the ItemUpload directory.

[Terminal] : Type ls Then type ls.
Highlight Article1.pdf, Article2.pdf, Article1-2metadata.csv ItemUpload directory has the files to be uploaded and their metadata in a CSV file.

i.e. Article1.pdf, Article2.pdf and Article1-2metadata.csv

Narration only Now, let us proceed to create a zip file in the SAF format..
[Terminal]: Type

cd $HOME/SAFBuilder

Type this command to change the present working directory to SAFBuilder.
[Terminal]: Type

./safbuilder.sh -c $HOME/ItemUpload/Article1-2metadata.csv -z

Type the next command as shown, to prepare a Simple Archive file.

The Simple Archive file creation is successful.

Narration only The Simple Archive file in zip format will be created in the directory of the metadata spreadsheet.

In our case it is ItemUpload.

[Terminal] : Type

cd $HOME/ItemUpload

Type this command to change the current directory to ItemUpload directory.
[Terminal] : Type ls Type the command ls to get a list of files in ItemUpload directory.
Highlight SimpleArchiveFormat.zip SimpleArchiveFormat.zip is seen here.
Narration only Now, let us proceed to upload the zip file for batch import in DSpace.
Web browser >> Address bar >> localhost:8080 Open the DSpace interface.
Log into DSpace with admin role

Email: dspace.u1@gmail.com Password: u1pass

Login to DSpace with your administrator authority.

I will login with my administrator authority.

Click on the logged in tab Click on the Logged in tab at the top right corner.
Select Administer Select Administer from the drop-down.
Click on Content tab Click on the Content tab in the Navigation bar.
Select Batch import From the drop-down, select Batch import.
Point to Batch import The Batch import page opens.
Point to Select the type of the input data Select type of the input data, Simple Archive Format (zip file via upload) is selected by default.
click on Browse button To upload the SAF zip file, click on the Browse button in the Select data file to upload field.
Point to File Upload The File Upload dialog box opens up.
Select SimpleArchiveFormat.zip Browse and select the file SimpleArchiveFormat.zip
Click Open button Then, click on the Open button.
Point to SimpleArchiveFormat.zip On success, the name of the file is displayed next to the Browse button.
Click on Select Collection drop down Click on Select the owning collection of the items drop-down.
Select the Articles Collection From the list, select the Collection into which we want to upload the Items.

I will select Articles.

Point to Select other collections that the items will belong to Select other collections that the items will belong to field appears above the Upload button.

For this demonstration, I’m not selecting any other Collection.

Click Upload button Click on the Upload button at the bottom of the page.
Point to notification A message is displayed, “The job was taken over, an email will be sent as soon as it is finished”.
Only Narration We can see the progress of the batch import in the My DSpace page.
Click on My DSpace link Click on My DSpace link.
Point to Batch imports In the Batch Imports section we can see the timestamp of the job submission for the batch import.
Point to success Also we can see the success status for this batch import job.
click on show more link To view more details, click on Show more link next to the timestamp of the job submission.
Point to Items to be imported and Items imported Items to be imported and Items imported along with the number of items is displayed.
click on the link show items To view the imported Items, click on the link Show items next to the label Items imported.
Narration Only Mapfile contains mapping of Items uploaded in the batch and their handle numbers.
Point to the Download mapfile button Download the mapfile by clicking on the Download mapfile button below the label Items imported.
Narration Only We can also delete the Items that were uploaded using batch import.
Point to Delete uploaded items & remove imports button To do so, use the button Delete uploaded items & remove import, next to Download mapfile button.
Narration Only For this demonstration, I will not be deleting the Items imported as a batch.
Narration only Now, let us cross-verify the batch import of the Items in the Collection.
click the Browse tab To do so, click the Browse tab in the Navigation bar.
Select Communities and Collections Then click on Communities and Collections from the drop-down.
Select the Article Collection Select the Articles Collection.
Point to Collection Home Page The Collection Home Page appears.
Scroll down Scroll down to locate the Items we imported in the Collection.
Point to Items We can see that the Items submitted using batch import are successfully uploaded here.
Point to Items Sometimes, Items uploaded in a batch may take some time to appear in the Collection.
Select the first Item Select the first Item submitted using batch import.
Point to metadata and file We can see the metadata of the Item and its file.
Narration only This means, we have successfully uploaded Items using SAFBuilder and Batch import.
Narration only On success of the Batch Import job, an email notification is also sent to the Administrator.

Let us proceed to cross-verify the email notification.

Log into administrator’s email account

Email: dspace.u1@gmail.com Password: d$pace2019*

Log in to your administrator's email account.

This is my administrator's email account.

Point to mail

DSpace - Batch import successfully completed

Here is the email with the subject DSpace - Batch import successfully completed
Narration Only If the email is not seen in the Inbox, then it could be in the SPAM folder.

Otherwise please recheck your internet connection.

Open a mail Let us open the email.
Highlight mapfile path The email contains a mapfile path.
Narration Only So now we have verified the email notification of successful completion of batch import.
Switch back to DSpace Switch back to DSpace.
Logout from DSpace Let us logout from the DSpace interface.
Only Narration This brings us to the end of this tutorial.

Let us summarize.

Slide: Summary In this tutorial we learnt to
  • Set up SAFBuilder
  • Create Simple Archive Format and
  • Do a batch import of Items into a Collection
Slide: Assignment As an assignment,
  • Create a metadata spreadsheet for Article3.pdf and Article4.pdf
  • Use metadata provided in the Code Files for Article3.pdf and Article4.pdf
  • Upload Article3.pdf and Article4.pdf in Article Collection using batch import.
Slide : About Spoken Tutorial project The video at the following link summarises the Spoken Tutorial project.

Please download and watch it.

Slide : Spoken Tutorial workshops The Spoken Tutorial Project team conducts workshops and gives certificates.

For more details, please write to us.

Slide: Forums Please post your timed queries in this Forum.
Slide: Acknowledgement -I Spoken Tutorial project is funded by MHRD, Government of India.
Slide: Acknowledgement -II DSpace spoken tutorial series is funded by the National Virtual Library of India, Ministry of Culture, Government of India.
Narration only This script and video for this tutorial was contributed by Pankaj Patil from IIT Bombay.

And this is Nancy Varkey signing off. Thanks for joining.

Contributors and Content Editors

Nancyvarkey, Pankajpatil694