-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathREADME_in-tree
1535 lines (1197 loc) · 70.6 KB
/
README_in-tree
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
Generic SCSI target mid-level for Linux (SCST)
==============================================
SCST is designed to provide unified, consistent interface between SCSI
target drivers and Linux kernel and simplify target drivers development
as much as possible. Detail description of SCST's features and internals
could be found on its Internet page http://scst.sourceforge.net.
SCST supports the following I/O modes:
* Pass-through mode with one to many relationship, i.e. when multiple
initiators can connect to the exported pass-through devices, for
the following SCSI devices types: disks (type 0), tapes (type 1),
processors (type 3), CDROMs (type 5), MO disks (type 7), medium
changers (type 8) and RAID controllers (type 0xC).
* FILEIO mode, which allows to use files on file systems or block
devices as virtual remotely available SCSI disks or CDROMs with
benefits of the Linux page cache.
* BLOCKIO mode, which performs direct block IO with a block device,
bypassing page-cache for all operations. This mode works ideally with
high-end storage HBAs and for applications that either do not need
caching between application and disk or need the large block
throughput.
* "Performance" device handlers, which provide in pseudo pass-through
mode a way for direct performance measurements without overhead of
actual data transferring from/to underlying SCSI device.
In addition, SCST supports advanced per-initiator access and devices
visibility management, so different initiators could see different set
of devices with different access permissions. See below for details.
Full list of SCST features and comparison with other Linux targets you
can find on http://scst.sourceforge.net/comparison.html.
Installation
------------
To see your devices remotely, you need to add a corresponding LUN for
them (see below how). By default, no local devices are seen remotely.
There must be LUN 0 in each LUNs set (security group), i.e. LUs
numeration must not start from, e.g., 1. Otherwise you will see no
devices on remote initiators and SCST core will write into the kernel
log message: "tgt_dev for LUN 0 not found, command to unexisting LU?"
It is highly recommended to use scstadmin utility for configuring
devices and security groups.
The flow of SCST inialization should be as the following:
1. Load of SCST modules with necessary module parameters, if needed.
2. Configure targets, devices, LUNs, etc. using either scstadmin
(recommended), or the sysfs interface directly as described below.
If you experience problems during modules load or running, check your
kernel logs (or run dmesg command for the few most recent messages).
IMPORTANT: Without loading appropriate device handler, corresponding devices
========= will be invisible for remote initiators, which could lead to holes
in the LUN addressing, so automatic device scanning by remote SCSI
mid-level could not notice the devices. Therefore you will have
to add them manually via
'echo "- - -" >/sys/class/scsi_host/hostX/scan',
where X - is the host number.
IMPORTANT: Working of target and initiator on the same host is
========= supported, except the following 2 cases: swap over target exported
device and using a writable mmap over a file from target
exported device. The latter means you can't mount a file
system over target exported device. In other words, you can
freely use any sg, sd, st, etc. devices imported from target
on the same host, but you can't mount file systems or put
swap on them. This is a limitation of Linux memory/cache
manager, because in this case a memory allocation deadlock is
possible like: system needs some memory -> it decides to
clear some cache -> the cache is needed to be written on a
target exported device -> initiator sends request to the
target located on the same system -> the target needs memory
-> the system needs even more memory -> deadlock.
IMPORTANT: In the current version simultaneous access to local SCSI devices
========= via standard high-level SCSI drivers (sd, st, sg, etc.) and
SCST's target drivers is unsupported. Especially it is
important for execution via sg and st commands that change
the state of devices and their parameters, because that could
lead to data corruption. If any such command is done, at
least related device handler(s) must be restarted. For block
devices READ/WRITE commands using direct disk handler are
generally safe.
Usage in failover mode
----------------------
It is recommended to use TEST UNIT READY ("tur") command to check if
SCST target is alive in MPIO configurations.
Device handlers
---------------
Device specific drivers (device handlers) are plugins for SCST, which
help SCST to analyze incoming requests and determine parameters,
specific to various types of devices. If an appropriate device handler
for a SCSI device type isn't loaded, SCST doesn't know how to handle
devices of this type, so they will be invisible for remote initiators
(more precisely, "LUN not supported" sense code will be returned).
In addition to device handlers for real devices, there are VDISK, user
space and "performance" device handlers.
VDISK device handler works over files on file systems and makes from
them virtual remotely available SCSI disks or CDROM's. In addition, it
allows to work directly over a block device, e.g. local IDE or SCSI disk
or ever disk partition, where there is no file systems overhead. Using
block devices comparing to sending SCSI commands directly to SCSI
mid-level via scsi_do_req()/scsi_execute_async() has advantage that data
are transferred via system cache, so it is possible to fully benefit
from caching and read ahead performed by Linux's VM subsystem. The only
disadvantage here that in the FILEIO mode there is superfluous data
copying between the cache and SCST's buffers. This issue is going to be
addressed in one of the future releases. Virtual CDROM's are useful for
remote installation. See below for details how to setup and use VDISK
device handler.
"Performance" device handlers for disks, MO disks and tapes in their
exec() method skip (pretend to execute) all READ and WRITE operations
and thus provide a way for direct link performance measurements without
overhead of actual data transferring from/to underlying SCSI device.
NOTE: Since "perf" device handlers on READ operations don't touch the
==== commands' data buffer, it is returned to remote initiators as it
was allocated, without even being zeroed. Thus, "perf" device
handlers impose some security risk, so use them with caution.
Compilation options
-------------------
There are the following compilation options, that could be change using
your favorite kernel configuration Makefile target, e.g. "make xconfig":
- CONFIG_SCST_DEBUG - if defined, turns on some debugging code,
including some logging. Makes the driver considerably bigger and slower,
producing large amount of log data.
- CONFIG_SCST_TRACING - if defined, turns on ability to log events. Makes the
driver considerably bigger and leads to some performance loss.
- CONFIG_SCST_EXTRACHECKS - if defined, adds extra validity checks in
the various places.
- CONFIG_SCST_USE_EXPECTED_VALUES - if not defined (default), initiator
supplied expected data transfer length and direction will be used
only for verification purposes to return error or warn in case if one
of them is invalid. Instead, locally decoded from SCSI command values
will be used. This is necessary for security reasons, because
otherwise a faulty initiator can crash target by supplying invalid
value in one of those parameters. This is especially important in
case of pass-through mode. If CONFIG_SCST_USE_EXPECTED_VALUES is
defined, initiator supplied expected data transfer length and
direction will override the locally decoded values. This might be
necessary if internal SCST commands translation table doesn't contain
SCSI command, which is used in your environment. You can know that if
you enable "minor" trace level and have messages like "Unknown
opcode XX for YY. Should you update scst_scsi_op_table?" in your
kernel log and your initiator returns an error. Also report those
messages in the SCST mailing list [email protected].
Note, that not all SCSI transports support supplying expected values.
You should try to enable this option if you have a not working with
SCST pass-through device, for instance, an SATA CDROM.
- CONFIG_SCST_DEBUG_TM - if defined, turns on task management functions
debugging, when on LUN 6 some of the commands will be delayed for
about 60 sec., so making the remote initiator send TM functions, eg
ABORT TASK and TARGET RESET. Also define
CONFIG_SCST_TM_DBG_GO_OFFLINE symbol in the Makefile if you want that
the device eventually become completely unresponsive, or otherwise to
circle around ABORTs and RESETs code. Needs CONFIG_SCST_DEBUG turned
on.
- CONFIG_SCST_STRICT_SERIALIZING - if defined, makes SCST send all commands to
underlying SCSI device synchronously, one after one. This makes task
management more reliable, with cost of some performance penalty. This
is mostly actual for stateful SCSI devices like tapes, where the
result of command's execution depends from device's settings defined
by previous commands. Disk and RAID devices are stateless in the most
cases. The current SCSI core in Linux doesn't allow to abort all
commands reliably if they sent asynchronously to a stateful device.
Turned off by default, turn it on if you use stateful device(s) and
need as much error recovery reliability as possible. As a side effect
of CONFIG_SCST_STRICT_SERIALIZING, on kernels below 2.6.30 no kernel
patching is necessary for pass-through device handlers (scst_disk,
etc.).
- CONFIG_SCST_TEST_IO_IN_SIRQ - if defined, allows SCST to submit selected
SCSI commands (TUR and READ/WRITE) from soft-IRQ context (tasklets).
Enabling it will decrease amount of context switches and slightly
improve performance. The goal of this option is to be able to measure
overhead of the context switches. If after enabling this option you
don't see under load in vmstat output on the target significant
decrease of amount of context switches, then your target driver
doesn't submit commands to SCST in IRQ context. For instance,
iSCSI-SCST doesn't do that, but qla2x00t with
CONFIG_QLA_TGT_DEBUG_WORK_IN_THREAD disabled - does. This option is
designed to be used with vdisk NULLIO backend.
WARNING! Using this option enabled with other backend than vdisk
NULLIO is unsafe and can lead you to a kernel crash!
- CONFIG_SCST_STRICT_SECURITY - if defined, makes SCST zero allocated data
buffers. Undefining it (default) considerably improves performance
and eases CPU load, but could create a security hole (information
leakage), so enable it, if you have strict security requirements.
- CONFIG_SCST_ABORT_CONSIDER_FINISHED_TASKS_AS_NOT_EXISTING - if defined,
in case when TASK MANAGEMENT function ABORT TASK is trying to abort a
command, which has already finished, remote initiator, which sent the
ABORT TASK request, will receive TASK NOT EXIST (or ABORT FAILED)
response for the ABORT TASK request. This is more logical response,
since, because the command finished, attempt to abort it failed, but
some initiators, particularly VMware iSCSI initiator, consider TASK
NOT EXIST response as if the target got crazy and try to RESET it.
Then sometimes get crazy itself. So, this option is disabled by
default.
- CONFIG_SCST_MEASURE_LATENCY - if defined, provides in "latency" files
global and per-LUN average commands processing latency statistic. You
can clear already measured results by writing 0 in each file. Note,
you need a non-preemptible kernel to have correct results.
HIGHMEM kernel configurations are fully supported, but not recommended
for performance reasons.
Module parameters
-----------------
Module scst supports the following parameters:
- scst_threads - allows to set count of SCST's threads. By default it
is CPU count.
- scst_max_cmd_mem - sets maximum amount of memory in MB allowed to be
consumed by the SCST commands for data buffers at any given time. By
default it is approximately TotalMem/4.
SCST sysfs interface
--------------------
SCST sysfs interface designed to be self descriptive and self
containing. This means that a high level managament tool for it can be
written once and automatically support any future sysfs interface
changes (attributes additions or removals, new target drivers and dev
handlers, etc.) without any modifications. Scstadmin is an example of
such management tool.
To implement that an management tool should not be implemented around
drivers and their attributes, but around common rules those drivers and
attributes follow. You can find those rules in SysfsRules file. For
instance, each SCST sysfs file (attribute) can contain in the last line
mark "[key]". It is automatically added to allow scstadmin and other
management tools to see which attributes it should save in the config
file. If you are doing manual attributes manipulations, you can ignore
this mark.
Root of SCST sysfs interface is /sys/kernel/scst_tgt. It has the
following entries:
- devices - this is a root subdirectory for all SCST devices
- handlers - this is a root subdirectory for all SCST dev handlers
- max_tasklet_cmd - specifies how many commands at max can be queued in
the SCST core simultaneously on a single CPU from all connected
initiators to allow processing commands on this CPU in soft-IRQ
context in tasklets. If the count of the commands exceeds this value,
then all of them will be processed only in SCST threads. This is to
to prevent possible under heavy load starvation of processes on the
CPUs serving soft IRQs and in some cases to improve performance by
more evenly spreading load over available CPUs.
- sgv - this is a root subdirectory for all SCST SGV caches
- targets - this is a root subdirectory for all SCST targets
- setup_id - allows to read and write SCST setup ID. This ID can be
used in cases, when the same SCST configuration should be installed
on several targets, but exported from those targets devices should
have different IDs and SNs. For instance, VDISK dev handler uses this
ID to generate T10 vendor specific identifier and SN of the devices.
- threads - allows to read and set number of global SCST I/O threads.
Those threads used with async. dev handlers, for instance, vdisk
BLOCKIO or NULLIO.
- trace_level - allows to enable and disable various tracing
facilities. See content of this file for help how to use it. See also
section "Dealing with massive logs" for more info how to make correct
logs when you enabled trace levels producing a lot of logs data.
- version - read-only attribute, which allows to see version of
SCST and enabled optional features.
- last_sysfs_mgmt_res - read-only attribute returning completion status
of the last management command. In the sysfs implementation there are
some problems between internal sysfs and internal SCST locking. To
avoid them in some cases sysfs calls can return error with errno
EAGAIN. This doesn't mean the operation failed. It only means that
the operation queued and not yet completed. To wait for it to
complete, an management tool should poll this file. If the operation
hasn't yet completed, it will also return EAGAIN. But after it's
completed, it will return the result of this operation (0 for success
or -errno for error).
"Devices" subdirectory contains subdirectories for each SCST devices.
Content of each device's subdirectory is dev handler specific. See
documentation for your dev handlers for more info about it as well as
SysfsRules file for more info about common to all dev handlers rules.
SCST dev handlers can have the following common entries:
- exported - subdirectory containing links to all LUNs where this
device was exported.
- handler - if dev handler determined for this device, this link points
to it. The handler can be not set for pass-through devices.
- threads_num - shows and allows to set number of threads in this device's
threads pool. If 0 - no threads will be created, and global SCST
threads pool will be used. If <0 - creation of the threads pool is
prohibited.
- threads_pool_type - shows and allows to sets threads pool type.
Possible values: "per_initiator" and "shared". When the value is
"per_initiator" (default), each session from each initiator will use
separate dedicated pool of threads. When the value is "shared", all
sessions from all initiators will share the same per-device pool of
threads. Valid only if threads_num attribute >0.
- dump_prs - allows to dump persistent reservations information in the
kernel log.
- type - SCSI type of this device
See below for more information about other entries of this subdirectory
of the standard SCST dev handlers.
"Handlers" subdirectory contains subdirectories for each SCST dev
handler.
Content of each handler's subdirectory is dev handler specific. See
documentation for your dev handlers for more info about it as well as
SysfsRules file for more info about common to all dev handlers rules.
SCST dev handlers can have the following common entries:
- mgmt - this entry allows to create virtual devices and their
attributes (for virtual devices dev handlers) or assign/unassign real
SCSI devices to/from this dev handler (for pass-through dev
handlers).
- trace_level - allows to enable and disable various tracing
facilities. See content of this file for help how to use it. See also
section "Dealing with massive logs" for more info how to make correct
logs when you enabled trace levels producing a lot of logs data.
- type - SCSI type of devices served by this dev handler.
See below for more information about other entries of this subdirectory
of the standard SCST dev handlers.
"Sgv" subdirectory contains statistic information of SCST SGV caches. It
has the following entries:
- None, one or more subdirectories for each existing SGV cache.
- global_stats - file containing global SGV caches statistics.
Each SGV cache's subdirectory has the following item:
- stats - file containing statistics for this SGV caches.
"Targets" subdirectory contains subdirectories for each SCST target.
Content of each target's subdirectory is target specific. See
documentation for your target for more info about it as well as
SysfsRules file for more info about common to all targets rules.
Every target should have at least the following entries:
- ini_groups - subdirectory, which contains and allows to define
initiator-oriented access control information, see below.
- luns - subdirectory, which contains list of available LUNs in the
target-oriented access control and allows to define it, see below.
- sessions - subdirectory containing connected to this target sessions.
- comment - this attribute can be used to store any human readable info
to help identify target. For instance, to help identify the target's
mapping to the corresponding hardware port. It isn't anyhow used by
SCST.
- enabled - using this attribute you can enable or disable this target/
It allows to finish configuring it before it starts accepting new
connections. 0 by default.
- addr_method - used LUNs addressing method. Possible values:
"Peripheral" and "Flat". Most initiators work well with Peripheral
addressing method (default), but some (HP-UX, for instance) may
require Flat method. This attribute is also available in the
initiators security groups, so you can assign the addressing method
on per-initiator basis.
- cpu_mask - defines CPU affinity mask for threads serving this target.
For threads serving LUNs it is used only for devices with
threads_pool_type "per_initiator".
- io_grouping_type - defines how I/O from sessions to this target are
grouped together. This I/O grouping is very important for
performance. By setting this attribute in a right value, you can
considerably increase performance of your setup. This grouping is
performed only if you use CFQ I/O scheduler on the target and for
devices with threads_num >= 0 and, if threads_num > 0, with
threads_pool_type "per_initiator". Possible values:
"this_group_only", "never", "auto", or I/O group number >0. When the
value is "this_group_only" all I/O from all sessions in this target
will be grouped together. When the value is "never", I/O from
different sessions will not be grouped together, i.e. all sessions in
this target will have separate dedicated I/O groups. When the value
is "auto" (default), all I/O from initiators with the same name
(iSCSI initiator name, for instance) in all targets will be grouped
together with a separate dedicated I/O group for each initiator name.
For iSCSI this mode works well, but other transports usually use
different initiator names for different sessions, so using such
transports in MPIO configurations you should either use value
"this_group_only", or an explicit I/O group number. This attribute is
also available in the initiators security groups, so you can assign
the I/O grouping on per-initiator basis. See below for more info how
to use this attribute.
- rel_tgt_id - allows to read or write SCSI Relative Target Port
Identifier attribute. This identifier is used to identify SCSI Target
Ports by some SCSI commands, mainly by Persistent Reservations
commands. This identifier must be unique among all SCST targets, but
for convenience SCST allows disabled targets to have not unique
rel_tgt_id. In this case SCST will not allow to enable this target
until rel_tgt_id becomes unique. This attribute initialized unique by
SCST by default.
A target driver may have also the following entries:
- "hw_target" - if the target driver supports both hardware and virtual
targets (for instance, an FC adapter supporting NPIV, which has
hardware targets for its physical ports as well as virtual NPIV
targets), this read only attribute for all hardware targets will
exist and contain value 1.
Subdirectory "sessions" contains one subdirectory for each connected
session with name equal to name of the connected initiator.
Each session subdirectory contains the following entries:
- initiator_name - contains initiator name
- force_close - optional write-only attribute, which allows to force
close this session.
- active_commands - contains number of active, i.e. not yet or being
executed, SCSI commands in this session.
- commands - contains overall number of SCSI commands in this session.
- latency - if CONFIG_SCST_MEASURE_LATENCY enabled, contains latency
statistics for this session.
- luns - a link pointing out to the corresponding LUNs set (security
group) where this session was attached to.
- One or more "lunX" subdirectories, where 'X' is a number, for each LUN
this session has (see below).
- other target driver specific attributes and subdirectories.
See below description of the VDISK's sysfs interface for samples.
Access and devices visibility management (LUN masking)
------------------------------------------------------
Access and devices visibility management allows for an initiator or
group of initiators to see different devices with different LUNs
with necessary access permissions.
SCST supports two modes of access control:
1. Target-oriented. In this mode you define for each target a default
set of LUNs, which are accessible to all initiators, connected to that
target. This is a regular access control mode, which people usually mean
thinking about access control in general. For instance, in IET this is
the only supported mode.
2. Initiator-oriented. In this mode you define which LUNs are accessible
for each initiator. In this mode you should create for each set of one
or more initiators, which should access to the same set of devices with
the same LUNs, a separate security group, then add to it devices and
names of allowed initiator(s).
Both modes can be used simultaneously. In this case the
initiator-oriented mode has higher priority, than the target-oriented,
i.e. initiators are at first searched in all defined security groups for
this target and, if none matches, the default target's set of LUNs is
used. This set of LUNs might be empty, then the initiator will not see
any LUNs from the target.
You can at any time find out which set of LUNs each session is assigned
to by looking where link
/sys/kernel/scst_tgt/targets/target_driver/target_name/sessions/initiator_name/luns
points to.
To configure the target-oriented access control SCST provides the
following interface. Each target's sysfs subdirectory
(/sys/kernel/scst_tgt/targets/target_driver/target_name) has "luns"
subdirectory. This subdirectory contains the list of already defined
target-oriented access control LUNs for this target as well as file
"mgmt". This file has the following commands, which you can send to it,
for instance, using "echo" shell command. You can always get a small
help about supported commands by looking inside this file. "Parameters"
are one or more param_name=value pairs separated by ';'.
- "add H:C:I:L lun [parameters]" - adds a pass-through device with
host:channel:id:lun with LUN "lun". Optionally, the device could be
marked as read only by using parameter "read_only". The recommended
way to find out H:C:I:L numbers is use of lsscsi utility.
- "replace H:C:I:L lun [parameters]" - replaces by pass-through device
with host:channel:id:lun existing with LUN "lun" device with
generation of INQUIRY DATA HAS CHANGED Unit Attention. If the old
device doesn't exist, this command acts as the "add" command.
Optionally, the device could be marked as read only by using
parameter "read_only". The recommended way to find out H:C:I:L
numbers is use of lsscsi utility.
- "add VNAME lun [parameters]" - adds a virtual device with name VNAME
with LUN "lun". Optionally, the device could be marked as read only
by using parameter "read_only".
- "replace VNAME lun [parameters]" - replaces by virtual device
with name VNAME existing with LUN "lun" device with generation of
INQUIRY DATA HAS CHANGED Unit Attention. If the old device doesn't
exist, this command acts as the "add" command. Optionally, the device
could be marked as read only by using parameter "read_only".
- "del lun" - deletes LUN lun
- "clear" - clears the list of devices
To configure the initiator-oriented access control SCST provides the
following interface. Each target's sysfs subdirectory
(/sys/kernel/scst_tgt/targets/target_driver/target_name) has "ini_groups"
subdirectory. This subdirectory contains the list of already defined
security groups for this target as well as file "mgmt". This file has
the following commands, which you can send to it, for instance, using
"echo" shell command. You can always get a small help about supported
commands by looking inside this file.
- "create GROUP_NAME" - creates a new security group.
- "del GROUP_NAME" - deletes a new security group.
Each security group's subdirectory contains 2 subdirectories: initiators
and luns as well as the following attributes: addr_method, cpu_mask and
io_grouping_type. See above description of them.
Each "initiators" subdirectory contains list of added to this groups
initiator as well as as well as file "mgmt". This file has the following
commands, which you can send to it, for instance, using "echo" shell
command. You can always get a small help about supported commands by
looking inside this file.
- "add INITIATOR_NAME" - adds initiator with name INITIATOR_NAME to the
group.
- "del INITIATOR_NAME" - deletes initiator with name INITIATOR_NAME
from the group.
- "move INITIATOR_NAME DEST_GROUP_NAME" moves initiator with name
INITIATOR_NAME from the current group to group with name
DEST_GROUP_NAME.
- "clear" - deletes all initiators from this group.
For "add" and "del" commands INITIATOR_NAME can be a simple DOS-type
patterns, containing '*' and '?' symbols. '*' means match all any
symbols, '?' means match only any single symbol. For instance,
"blah.xxx" will match "bl?h.*". Additionally, you can use negative sign
'!' to revert the value of the pattern. For instance, "ah.xxx" will
match "!bl?h.*".
Each "luns" subdirectory contains the list of already defined LUNs for
this group as well as file "mgmt". Content of this file as well as list
of available in it commands is fully identical to the "luns"
subdirectory of the target-oriented access control.
Examples:
- echo "create INI" >/sys/kernel/scst_tgt/targets/iscsi/iqn.2006-10.net.vlnb:tgt1/ini_groups/mgmt -
creates security group INI for target iqn.2006-10.net.vlnb:tgt1.
- echo "add 2:0:1:0 11" >/sys/kernel/scst_tgt/targets/iscsi/iqn.2006-10.net.vlnb:tgt1/ini_groups/INI/luns/mgmt -
adds a pass-through device sitting on host 2, channel 0, ID 1, LUN 0
to group with name INI as LUN 11.
- echo "add disk1 0" >/sys/kernel/scst_tgt/targets/iscsi/iqn.2006-10.net.vlnb:tgt1/ini_groups/INI/luns/mgmt -
adds a virtual disk with name disk1 to group with name INI as LUN 0.
- echo "add 21:*:e0:?b:83:*" >/sys/kernel/scst_tgt/targets/21:00:00:a0:8c:54:52:12/ini_groups/INI/initiators/mgmt -
adds a pattern to group with name INI to Fibre Channel target with
WWN 21:00:00:a0:8c:54:52:12, which matches WWNs of Fibre Channel
initiator ports.
Consider you need to have an iSCSI target with name
"iqn.2007-05.com.example:storage.disk1.sys1.xyz", which should export
virtual device "dev1" with LUN 0 and virtual device "dev2" with LUN 1,
but initiator with name
"iqn.2007-05.com.example:storage.disk1.spec_ini.xyz" should see only
virtual device "dev2" read only with LUN 0. To achieve that you should
do the following commands:
# echo "iqn.2007-05.com.example:storage.disk1.sys1.xyz" >/sys/kernel/scst_tgt/targets/iscsi/mgmt
# echo "add dev1 0" >/sys/kernel/scst_tgt/targets/iscsi/iqn.2007-05.com.example:storage.disk1.sys1.xyz/luns/mgmt
# echo "add dev2 1" >/sys/kernel/scst_tgt/targets/iscsi/iqn.2007-05.com.example:storage.disk1.sys1.xyz/luns/mgmt
# echo "create SPEC_INI" >/sys/kernel/scst_tgt/targets/iscsi/iqn.2007-05.com.example:storage.disk1.sys1.xyz/ini_groups/mgmt
# echo "add dev2 0 read_only=1" \
>/sys/kernel/scst_tgt/targets/iscsi/iqn.2007-05.com.example:storage.disk1.sys1.xyz/ini_groups/SPEC_INI/luns/mgmt
# echo "iqn.2007-05.com.example:storage.disk1.spec_ini.xyz" \
>/sys/kernel/scst_tgt/targets/iscsi/iqn.2007-05.com.example:storage.disk1.sys1.xyz/ini_groups/SPEC_INI/initiators/mgmt
For Fibre Channel or SAS in the above example you should use target's
and initiator ports WWNs instead of iSCSI names.
It is highly recommended to use scstadmin utility instead of described
in this section low level interface.
IMPORTANT
=========
There must be LUN 0 in each set of LUNs, i.e. LUs numeration must not
start from, e.g., 1. Otherwise you will see no devices on remote
initiators and SCST core will write into the kernel log message: "tgt_dev
for LUN 0 not found, command to unexisting LU?"
IMPORTANT
=========
All the access control must be fully configured BEFORE the corresponding
target is enabled. When you enable a target, it will immediately start
accepting new connections, hence creating new sessions, and those new
sessions will be assigned to security groups according to the
*currently* configured access control settings. For instance, to
the default target's set of LUNs, instead of "HOST004" group as you may
need, because "HOST004" doesn't exist yet. So, you must configure all
the security groups before new connections from the initiators are
created, i.e. before the target enabled.
VDISK device handler
--------------------
VDISK has 4 built-in dev handlers: vdisk_fileio, vdisk_blockio,
vdisk_nullio and vcdrom. Roots of their sysfs interface are
/sys/kernel/scst_tgt/handlers/handler_name, e.g. for vdisk_fileio:
/sys/kernel/scst_tgt/handlers/vdisk_fileio. Each root has the following
entries:
- None, one or more links to devices with name equal to names
of the corresponding devices.
- trace_level - allows to enable and disable various tracing
facilities. See content of this file for help how to use it. See also
section "Dealing with massive logs" for more info how to make correct
logs when you enabled trace levels producing a lot of logs data.
- mgmt - main management entry, which allows to add/delete VDISK
devices with the corresponding type.
The "mgmt" file has the following commands, which you can send to it,
for instance, using "echo" shell command. You can always get a small
help about supported commands by looking inside this file. "Parameters"
are one or more param_name=value pairs separated by ';'.
- echo "add_device device_name [parameters]" - adds a virtual device
with name device_name and specified parameters (see below)
- echo "del_device device_name" - deletes a virtual device with name
device_name.
Handler vdisk_fileio provides FILEIO mode to create virtual devices.
This mode uses as backend files and accesses to them using regular
read()/write() file calls. This allows to use full power of Linux page
cache. The following parameters possible for vdisk_fileio:
- filename - specifies path and file name of the backend file. The path
must be absolute.
- blocksize - specifies block size used by this virtual device. The
block size must be power of 2 and >= 512 bytes. Default is 512.
- write_through - disables write back caching. Note, this option
has sense only if you also *manually* disable write-back cache in
*all* your backstorage devices and make sure it's actually disabled,
since many devices are known to lie about this mode to get better
benchmark results. Default is 0.
- read_only - read only. Default is 0.
- o_direct - disables both read and write caching. This mode isn't
currently fully implemented, you should use user space fileio_tgt
program in O_DIRECT mode instead (see below).
- nv_cache - enables "non-volatile cache" mode. In this mode it is
assumed that the target has a GOOD UPS with ability to cleanly
shutdown target in case of power failure and it is software/hardware
bugs free, i.e. all data from the target's cache are guaranteed
sooner or later to go to the media. Hence all data synchronization
with media operations, like SYNCHRONIZE_CACHE, are ignored in order
to bring more performance. Also in this mode target reports to
initiators that the corresponding device has write-through cache to
disable all write-back cache workarounds used by initiators. Use with
extreme caution, since in this mode after a crash of the target
journaled file systems don't guarantee the consistency after journal
recovery, therefore manual fsck MUST be ran. Note, that since usually
the journal barrier protection (see "IMPORTANT" note below) turned
off, enabling NV_CACHE could change nothing from data protection
point of view, since no data synchronization with media operations
will go from the initiator. This option overrides "write_through"
option. Disabled by default.
- thin_provisioned - enables thin provisioning facility, when remote
initiators can unmap blocks of storage, if they don't need them
anymore. Backend storage also must support this facility.
- removable - with this flag set the device is reported to remote
initiators as removable.
Handler vdisk_blockio provides BLOCKIO mode to create virtual devices.
This mode performs direct block I/O with a block device, bypassing the
page cache for all operations. This mode works ideally with high-end
storage HBAs and for applications that either do not need caching
between application and disk or need the large block throughput. See
below for more info.
The following parameters possible for vdisk_blockio: filename,
blocksize, nv_cache, read_only, removable, thin_provisioned. See
vdisk_fileio above for description of those parameters.
Handler vdisk_nullio provides NULLIO mode to create virtual devices. In
this mode no real I/O is done, but success returned to initiators.
Intended to be used for performance measurements at the same way as
"*_perf" handlers. The following parameters possible for vdisk_nullio:
blocksize, read_only, removable. See vdisk_fileio above for description
of those parameters.
Handler vcdrom allows emulation of a virtual CDROM device using an ISO
file as backend. It doesn't have any parameters.
For example:
echo "add_device disk1 filename=/disk1; blocksize=4096; nv_cache=1" >/sys/kernel/scst_tgt/handlers/vdisk_fileio/mgmt
will create a FILEIO virtual device disk1 with backend file /disk1
with block size 4K and NV_CACHE enabled.
Each vdisk_fileio's device has the following attributes in
/sys/kernel/scst_tgt/devices/device_name:
- filename - contains path and file name of the backend file.
- blocksize - contains block size used by this virtual device.
- write_through - contains status of write back caching of this virtual
device.
- read_only - contains read only status of this virtual device.
- o_direct - contains O_DIRECT status of this virtual device.
- nv_cache - contains NV_CACHE status of this virtual device.
- thin_provisioned - contains thin provisioning status of this virtual
device
- removable - contains removable status of this virtual device.
- size_mb - contains size of this virtual device in MB.
- t10_dev_id - contains and allows to set T10 vendor specific
identifier for Device Identification VPD page (0x83) of INQUIRY data.
By default VDISK handler always generates t10_dev_id for every new
created device at creation time based on the device name and
scst_vdisk_ID scst_vdisk.ko module parameter (see below).
- usn - contains the virtual device's serial number of INQUIRY data. It
is created at the device creation time based on the device name and
scst_vdisk_ID scst_vdisk.ko module parameter (see below).
- type - contains SCSI type of this virtual device.
- resync_size - write only attribute, which makes vdisk_fileio to
rescan size of the backend file. It is useful if you changed it, for
instance, if you resized it.
For example:
/sys/kernel/scst_tgt/devices/disk1
|-- blocksize
|-- exported
| |-- export0 -> ../../../targets/iscsi/iqn.2006-10.net.vlnb:tgt/luns/0
| |-- export1 -> ../../../targets/iscsi/iqn.2006-10.net.vlnb:tgt/ini_groups/INI/luns/0
| |-- export2 -> ../../../targets/iscsi/iqn.2006-10.net.vlnb:tgt1/luns/0
| |-- export3 -> ../../../targets/iscsi/iqn.2006-10.net.vlnb:tgt1/ini_groups/INI1/luns/0
| |-- export4 -> ../../../targets/iscsi/iqn.2006-10.net.vlnb:tgt1/ini_groups/INI2/luns/0
|-- filename
|-- handler -> ../../handlers/vdisk_fileio
|-- nv_cache
|-- o_direct
|-- read_only
|-- removable
|-- resync_size
|-- size_mb
|-- t10_dev_id
|-- thin_provisioned
|-- threads_num
|-- threads_pool_type
|-- type
|-- usn
`-- write_through
Each vdisk_blockio's device has the following attributes in
/sys/kernel/scst_tgt/devices/device_name: blocksize, filename, nv_cache,
read_only, removable, resync_size, size_mb, t10_dev_id,
thin_provisioned, threads_num, threads_pool_type, type, usn. See above
description of those parameters.
Each vdisk_nullio's device has the following attributes in
/sys/kernel/scst_tgt/devices/device_name: blocksize, read_only,
removable, size_mb, t10_dev_id, threads_num, threads_pool_type, type,
usn. See above description of those parameters.
Each vcdrom's device has the following attributes in
/sys/kernel/scst_tgt/devices/device_name: filename, size_mb,
t10_dev_id, threads_num, threads_pool_type, type, usn. See above
description of those parameters. Exception is filename attribute. For
vcdrom it is writable. Writing to it allows to virtually insert or
change virtual CD media in the virtual CDROM device. For example:
- echo "/image.iso" >/sys/kernel/scst_tgt/devices/cdrom/filename - will
insert file /image.iso as virtual media to the virtual CDROM cdrom.
- echo "" >/sys/kernel/scst_tgt/devices/cdrom/filename - will remove
"media" from the virtual CDROM cdrom.
Additionally VDISK handler has module parameter "num_threads", which
specifies count of I/O threads for each FILEIO VDISK's or VCDROM device.
If you have a workload, which tends to produce rather random accesses
(e.g. DB-like), you should increase this count to a bigger value, like
32. If you have a rather sequential workload, you should decrease it to
a lower value, like number of CPUs on the target or even 1. Due to some
limitations of Linux I/O subsystem, increasing number of I/O threads too
much leads to sequential performance drop, especially with deadline
scheduler, so decreasing it can improve sequential performance. The
default provides a good compromise between random and sequential
accesses.
You shouldn't be afraid to have too many VDISK I/O threads if you have
many VDISK devices. Kernel threads consume very little amount of
resources (several KBs) and only necessary threads will be used by SCST,
so the threads will not trash your system.
CAUTION: If you partitioned/formatted your device with block size X, *NEVER*
======== ever try to export and then mount it (even accidentally) with another
block size. Otherwise you can *instantly* damage it pretty
badly as well as all your data on it. Messages on initiator
like: "attempt to access beyond end of device" is the sign of
such damage.
Moreover, if you want to compare how well different block sizes
work for you, you **MUST** EVERY TIME AFTER CHANGING BLOCK SIZE
**COMPLETELY** **WIPE OFF** ALL THE DATA FROM THE DEVICE. In
other words, THE **WHOLE** DEVICE **MUST** HAVE ONLY **ZEROS**
AS THE DATA AFTER YOU SWITCH TO NEW BLOCK SIZE. Switching block
sizes isn't like switching between FILEIO and BLOCKIO, after
changing block size all previously written with another block
size data MUST BE ERASED. Otherwise you will have a full set of
very weird behaviors, because blocks addressing will be
changed, but initiators in most cases will not have a
possibility to detect that old addresses written on the device
in, e.g., partition table, don't refer anymore to what they are
intended to refer.
IMPORTANT: Some disk and partition table management utilities don't support
========= block sizes >512 bytes, therefore make sure that your favorite one
supports it. Currently only cfdisk is known to work only with
512 bytes blocks, other utilities like fdisk on Linux or
standard disk manager on Windows are proved to work well with
non-512 bytes blocks. Note, if you export a disk file or
device with some block size, different from one, with which
it was already partitioned, you could get various weird
things like utilities hang up or other unexpected behavior.
Hence, to be sure, zero the exported file or device before
the first access to it from the remote initiator with another
block size. On Window initiator make sure you "Set Signature"
in the disk manager on the imported from the target drive
before doing any other partitioning on it. After you
successfully mounted a file system over non-512 bytes block
size device, the block size stops matter, any program will
work with files on such file system.
Dealing with massive logs
-------------------------
If you want to enable using "trace_level" file logging levels, which
produce a lot of events, like "debug", to not loose logged events you
should also:
* Increase in .config of your kernel CONFIG_LOG_BUF_SHIFT variable
to much bigger value, then recompile it. For example, value 25 will
provide good protection from logging overflow even under high volume
of logging events. To use it you will need to modify the maximum
allowed value for CONFIG_LOG_BUF_SHIFT in the corresponding Kconfig
file to 25 as well.
* Change in your /etc/syslog.conf or other config file of your favorite
logging program to store kernel logs in async manner. For example,
you can add in rsyslog.conf line "kern.info -/var/log/kernel" and
add "kern.none" in line for /var/log/messages, so the resulting line
would looks like:
"*.info;kern.none;mail.none;authpriv.none;cron.none /var/log/messages"
Persistent Reservations
-----------------------
SCST implements Persistent Reservations with full set of capabilities,
including "Persistence Through Power Loss".
The "Persistence Through Power Loss" data are saved in /var/lib/scst/pr
with files with names the same as the names of the corresponding
devices. Also this directory contains backup versions of those files
with suffix ".1". Those backup files are used in case of power or other
failure to prevent Persistent Reservation information from corruption
during update.
The Persistent Reservations available on all transports implementing
get_initiator_port_transport_id() callback. Transports not implementing
this callback will act in one of 2 possible scenarios ("all or
nothing"):
1. If a device has such transport connected and doesn't have persistent
reservations, it will refuse Persistent Reservations commands as if it
doesn't support them.
2. If a device has persistent reservations, all initiators newly
connecting via such transports will not see this device. After all
persistent reservations from this device are released, upon reconnect
the initiators will see it.
Caching
-------
By default for performance reasons VDISK FILEIO devices use write back
caching policy.
Generally, write back caching is safe for use and danger of it is
greatly overestimated, because most modern (especially, Enterprise
level) applications are well prepared to work with write back cached
storage. Particularly, such are all transactions-based applications.
Those applications flush cache to completely avoid ANY data loss on a
crash or power failure. For instance, journaled file systems flush cache
on each meta data update, so they survive power/hardware/software
failures pretty well.
Since locally on initiators write back caching is always on, if an
application cares about its data consistency, it does flush the cache
when necessary or on any write, if open files with O_SYNC. If it doesn't
care, it doesn't flush the cache. As soon as the cache flushes
propagated to the storage, write back caching on it doesn't make any
difference. If application doesn't flush the cache, it's doomed to loose
data in case of a crash or power failure doesn't matter where this cache
located, locally or on the storage.
To illustrate that consider, for example, a user who wants to copy /src
directory to /dst directory reliably, i.e. after the copy finished no