Skip to content
This repository has been archived by the owner on Feb 23, 2023. It is now read-only.

Metadata Management

Andrei Tsaregorodtsev edited this page Jan 27, 2015 · 11 revisions

Metadata management

The DIRAC File Catalog allows user to associate any metadata with files and directories. If some metadata is associated with a directory, this metadata is considered to be inherited by all the files and subdirectories in this directory.

Adding metadata

Metadata is defined in a form of key=value pairs, where the value can be an arbitrary string. The metadata is added to a given file with the following commands:

$ # Check the current directory and its contents
$ dpwd
/vo.formation.idgrilles.fr/user/a/atsareg/tutorial
$ dls -l
/vo.formation.idgrilles.fr/user/a/atsareg/tutorial:
-rwxrwxr-x 1 atsareg dirac_user 718 2015-01-26 23:09:29 file1
-rwxrwxr-x 1 atsareg dirac_user 256 2015-01-26 23:10:44 file2
-rwxrwxr-x 1 atsareg dirac_user 483 2015-01-26 23:11:09 file3
$ # Add some metadata to the files
$ dmeta add file1 Year=2015
$ dmeta add file2 Year=2014
$ # Check the metadata associated with a file
$ dmeta ls file1
Year=2015

We can add some metadata to the directory containing files and check it afterwards:

$ dmeta add . DataType=tutorial
$ dmeta ls .
DataType=tutorial

Now let us see what is the metadata associated with the files that we have updated above:

$ dmeta ls file1
DataType=tutorial
Year=2014
$ dmeta ls file2
DataType=tutorial
Year=2015

As you can see, the files inherited the metadata of their directory.

Removing metadata

The removal of the metadata is quite straightforward:

$ dmeta rm file2 Year
$ dmeta ls file2
DataType=tutorial

The file2 now has only metadata inherited from its directory. The directory metadata is removed in the same way:

$ dmeta rm . DataType
$ dmeta add . Version=1.0
$ dmeta ls file1
Version=1.0
Year=2014

The file1 has a new inherited metadatum Version, no DataType, and its own metadatum Year.

Metadata indices

The metadata defined so far can be regarded as file or directory annotations. It can not be used to find files with certain properties. In order to do that some metadata can be defined as being index. It means that this metadata can be used in queries for files. The indices are defined in the following way:

$ # Define a new integer type index for files called Year
$ dmeta -i f Year=int
$ # Define a new string type index for directories called DataType
$ dmeta -i d DataType=string

The indices are defined separately for files and directories. If metadata with the same name as the index name is already defined, it will be indexed, i.e. it will become searchable. The same rule applies as for the ordinary non-indexed metadata: subdirectories and files inherit the indices of their directories. In particular, files are inheriting metadata and indices of all the directories above them.

It is important to note that indices are a powerful tool to classify the user data. However, for efficiency reasons, the number of defined indices should be kept as small as reasonable for the given community of users utilizing the given DIRAC File Catalog service. Therefore, normal users usually are not granted privileges to add new indices. If they really need some, they should ask the community administrator who has the appropriate rights.

The already available indices can be looked up as:

$ dmeta -I
     FileMetaFields : {'Year': 'INT'}
DirectoryMetaFields : {'DataType': 'VARCHAR(128)'}

Searching for files

Now that the indices are defined and populated, we can look for files with given properties. For example:

$ # Look for files in any directory with a given property
$ dfind / DataType=tutorial
/vo.formation.idgrilles.fr/user/a/atsareg/tutorial/file1
/vo.formation.idgrilles.fr/user/a/atsareg/tutorial/file2
/vo.formation.idgrilles.fr/user/a/atsareg/tutorial/file3
$ # Look for files from a given year, limit the search to the current directory
$ $ dfind . Year=2014
/vo.formation.idgrilles.fr/user/a/atsareg/tutorial/file1
$ # Define year for file2
$ dmeta add file2 Year=2015
$ # Find files from year 2014 and later and combine the search with a chosen DataType
$ dfind . "Year>=2014" DataType=tutorial
/vo.formation.idgrilles.fr/user/a/atsareg/tutorial/file1
/vo.formation.idgrilles.fr/user/a/atsareg/tutorial/file2
$ # Find files with a Year from a given set of values
$ dfind . Year=2014,2015
/vo.formation.idgrilles.fr/user/a/atsareg/tutorial/file1
/vo.formation.idgrilles.fr/user/a/atsareg/tutorial/file2

Note that in some cases, the search condition must be enclosed in quotes to overcome the non-desirable interpretation of a special symbol ( > ) by the shell.

Some standard file metadata can be used in order to make queries more specific. For example, one can look for files having replicas in a given SE:

$ # Replicate file1 to the DIRAC-USER storage element
$ drepl file1 -D DIRAC-USER
$ dls -L
  /vo.formation.idgrilles.fr/user/a/atsareg/tutorial:
-rwxrwxr-x 2 atsareg dirac_user     718 2015-01-26 23:09:29 file1
   MCIA-irods      dips://ccdiracli04.in2p3.fr:9188/DataManagement/IRODSStorageElemen/vo.formation.idgrilles.fr  /user/a/atsareg/tutorial/file1
   DIRAC-USER      dips://ccdiracli04.in2p3.fr:9150/DataManagement/StorageElement/vo.formation.idgrilles.fr/user/a/atsareg/tutorial/file1
-rwxrwxr-x 1 atsareg dirac_user     256 2015-01-26 23:10:44 file2
   MCIA-irods      dips://ccdiracli04.in2p3.fr:9188/DataManagement/IRODSStorageElemen/vo.formation.idgrilles.fr  /user/a/atsareg/tutorial/file2
-rwxrwxr-x 1 atsareg dirac_user     483 2015-01-26 23:11:09 file3
   MCIA-irods      dips://ccdiracli04.in2p3.fr:9188/DataManagement/IRODSStorageElement/vo.formation.idgrilles.fr/user/a/atsareg/tutorial/file3
$ # Find files having replica in the DIRAC-USER storage element
$ dfind . SE=DIRAC-USER
/vo.formation.idgrilles.fr/user/a/atsareg/tutorial/file1

More searching criteria is planned to be added to the DIRAC File Catalog.