Skip to content

Latest commit

 

History

History
19 lines (15 loc) · 1.54 KB

CustomShardingFunction.md

File metadata and controls

19 lines (15 loc) · 1.54 KB

Main page

Custom sharding function

A custom sharding function can be used to determine the bucket number - location in the cluster - and used further in the cluster operations. For this purpose you need:

  1. a hash function
    As an example, a default function from tarantool/vshard - crc32 with specific polynomial value. Java doesn't have crc32 out of the box with the ability to pass a polynomial value, so we'll implement our own:
    private static long crc32(byte[] data) {
    BitSet bitSet = BitSet.valueOf(data);
    int crc32 = 0xFFFFFFFF; // initial value
    for (int i = 0; i < data.length * 8; i++) {
    if (((crc32 >>> 31) & 1) != (bitSet.get(i) ? 1 : 0)) {
    crc32 = (crc32 << 1) ^ 0x1EDC6F41; // xor with polynomial
    } else {
    crc32 = crc32 << 1;
    }
    }
    crc32 = Integer.reverse(crc32); // result reflect
    return crc32 & 0x00000000ffffffffL; // the unsigned java problem
    }
    }
  2. the number of buckets
    This number can be obtained from Tarantool via vshard.router.bucket_count function out of vshard module
    public static <T extends Packable, R extends Collection<T>> Integer getBucketCount(
    TarantoolClient<T, R> client) throws ExecutionException, InterruptedException {
    if (!bucketCount.isPresent()) {
    bucketCount = Optional.ofNullable(
    client.callForSingleResult("vshard.router.bucket_count", Integer.class).get()
    );
    }
    return bucketCount.orElseThrow(() -> new TarantoolClientException("Failed to get bucket count"));
    }

Then we can determine bucket id by passing your key through hash function and get the remainder of the division by number of buckets:

public static <T extends Packable, R extends Collection<T>> Integer getBucketIdStrCRC32(
TarantoolClient<T, R> client, List<Object> key) throws ExecutionException, InterruptedException {
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
for (Object part : key) {
try {
if (part != null) {
outputStream.write(part.toString().getBytes());
}
} catch (IOException e) {
throw new RuntimeException(e);
}
}
return Math.toIntExact(
(crc32(outputStream.toByteArray()) % getBucketCount(client)) + 1
);
}

After that we may apply it in operations:

TarantoolSpaceOperations<TarantoolTuple, TarantoolResult<TarantoolTuple>> profileSpace =
client.space(TEST_SPACE_NAME);
TarantoolTuple tarantoolTuple = tupleFactory.create(1, null, "FIO", 50, 100);
Conditions condition = Conditions.equals(PK_FIELD_NAME, 1);
Integer bucketId = Utils.getBucketIdStrCRC32(client,
Collections.singletonList(tarantoolTuple.getInteger(0)));
InsertOptions insertOptions = ProxyInsertOptions.create().withBucketId(bucketId);
TarantoolResult<TarantoolTuple> insertResult = profileSpace.insert(tarantoolTuple, insertOptions).get();
assertEquals(1, insertResult.size());
TarantoolResult<TarantoolTuple> selectResult = profileSpace.select(condition).get();
assertEquals(1, selectResult.size());
SelectOptions selectOptions = ProxySelectOptions.create().withBucketId(bucketId);
selectResult = profileSpace.select(condition, selectOptions).get();
assertEquals(1, selectResult.size());