-
Notifications
You must be signed in to change notification settings - Fork 6
Metadata Management
The DIRAC File Catalog allows user to associate any metadata with files and directories. If some metadata is associated with a directory, this metadata is considered to be inherited by all the files and subdirectories in this directory.
Metadata is defined in a form of key=value pairs, where the value can be an arbitrary string. The metadata is added to a given file with the following commands:
$ # Check the current directory and its contents $ dpwd /vo.formation.idgrilles.fr/user/a/atsareg/tutorial $ dls -l /vo.formation.idgrilles.fr/user/a/atsareg/tutorial: -rwxrwxr-x 1 atsareg dirac_user 718 2015-01-26 23:09:29 file1 -rwxrwxr-x 1 atsareg dirac_user 256 2015-01-26 23:10:44 file2 -rwxrwxr-x 1 atsareg dirac_user 483 2015-01-26 23:11:09 file3 $ # Add some metadata to the files $ dmeta add file1 Year=2015 $ dmeta add file2 Year=2014 $ # Check the metadata associated with a file $ dmeta ls file1 Year=2015
We can add some metadata to the directory containing files and check it afterwards:
$ dmeta add . DataType=tutorial $ dmeta ls . DataType=tutorial
Now let us see what is the metadata associated with the files that we have updated above:
$ dmeta ls file1 DataType=tutorial Year=2014 $ dmeta ls file2 DataType=tutorial Year=2015
As you can see, the files inherited the metadata of their directory.
The removal of the metadata is quite straightforward:
$ dmeta rm file2 Year $ dmeta ls file2 DataType=tutorial
The file2 now has only metadata inherited from its directory. The directory metadata is removed in the same way:
$ dmeta rm . DataType $ dmeta add . Version=1.0 $ dmeta ls file1 Version=1.0 Year=2014
The file1 has a new inherited metadatum Version, no DataType, and its own metadatum Year.
The metadata defined so far can be regarded as file or directory annotations. It can not be used to find files with certain properties. In order to do that some metadata can be defined as being index. It means that this metadata can be used in queries for files. The indices are defined in the following way:
$ # Define a new integer type index for files called Year $ dmeta -i f Year=int $ # Define a new string type index for directories called DataType $ dmeta -i d DataType=string
The indices are defined separately for files and directories. If metadata with the same name as the index name is already defined, it will be indexed, i.e. it will become searchable. The same rule applies as for the ordinary non-indexed metadata: subdirectories and files inherit the indices of their directories. In particular, files are inheriting metadata and indices of all the directories above them.
It is important to note that indices are a powerful tool to classify the user data. However, for efficiency reasons, the number of defined indices should be kept as small as reasonable for the given community of users utilizing the given DIRAC File Catalog service. Therefore, normal users usually are not granted privileges to add new indices. If they really need some, they should ask the community administrator who has the appropriate rights.
The already available indices can be looked up as:
$ dmeta -I FileMetaFields : {'Year': 'INT'} DirectoryMetaFields : {'DataType': 'VARCHAR(128)'}
Now that the indices are defined and populated, we can look for files with given properties. For example:
$ # Look for files in any directory with a given property $ dfind / DataType=tutorial /vo.formation.idgrilles.fr/user/a/atsareg/tutorial/file1 /vo.formation.idgrilles.fr/user/a/atsareg/tutorial/file2 /vo.formation.idgrilles.fr/user/a/atsareg/tutorial/file3 $ # Look for files from a given year, limit the search to the current directory $ $ dfind . Year=2014 /vo.formation.idgrilles.fr/user/a/atsareg/tutorial/file1 $ # Define year for file2 $ dmeta add file2 Year=2015 $ # Find files from year 2014 and later and combine the search with a chosen DataType $ dfind . "Year>=2014" DataType=tutorial /vo.formation.idgrilles.fr/user/a/atsareg/tutorial/file1 /vo.formation.idgrilles.fr/user/a/atsareg/tutorial/file2 $ # Find files with a Year from a given set of values $ dfind . Year=2014,2015 /vo.formation.idgrilles.fr/user/a/atsareg/tutorial/file1 /vo.formation.idgrilles.fr/user/a/atsareg/tutorial/file2
Note that in some cases, the search condition must be enclosed in quotes to overcome the non-desirable interpretation of a special symbol ( > ) by the shell.
Some standard file metadata can be used in order to make queries more specific. For example, one can look for files having replicas in a given SE:
$ # Replicate file1 to the DIRAC-USER storage element $ drepl file1 -D DIRAC-USER $ dls -L /vo.formation.idgrilles.fr/user/a/atsareg/tutorial: -rwxrwxr-x 2 atsareg dirac_user 718 2015-01-26 23:09:29 file1 MCIA-irods dips://ccdiracli04.in2p3.fr:9188/DataManagement/IRODSStorageElemen/vo.formation.idgrilles.fr /user/a/atsareg/tutorial/file1 DIRAC-USER dips://ccdiracli04.in2p3.fr:9150/DataManagement/StorageElement/vo.formation.idgrilles.fr/user/a/atsareg/tutorial/file1 -rwxrwxr-x 1 atsareg dirac_user 256 2015-01-26 23:10:44 file2 MCIA-irods dips://ccdiracli04.in2p3.fr:9188/DataManagement/IRODSStorageElemen/vo.formation.idgrilles.fr /user/a/atsareg/tutorial/file2 -rwxrwxr-x 1 atsareg dirac_user 483 2015-01-26 23:11:09 file3 MCIA-irods dips://ccdiracli04.in2p3.fr:9188/DataManagement/IRODSStorageElement/vo.formation.idgrilles.fr/user/a/atsareg/tutorial/file3 $ # Find files having replica in the DIRAC-USER storage element $ dfind . SE=DIRAC-USER /vo.formation.idgrilles.fr/user/a/atsareg/tutorial/file1
More searching criteria is planned to be added to the DIRAC File Catalog.