Should I use RAID-0 on my validator?

Should I use RAID-0 on my validator?
WEB3 ENGINEERING
"Should I use RAID-0 on my validator?"
This is the question we decided to answer today. As you might be aware, there are multiple types of activities on Solana, such as updating accounts, ledger management, snapshots, and more. It's important to understand the alternatives to placing everything on a RAID-0 array, which handles load balancing for you. To explore this, we wrote several scripts to analyze the per-file load on a validator.
Our first script uses strace to attach to all Solana threads and log their activities. It does this for 5 seconds per thread, so the process may take a while (approximately 20 minutes).
get_all_threads.sh
#!/bin/bash

SNAPSHOT_DATE=$(date +%s)

PARENT_PID=$(pgrep -f solana-validator)
for child_pid in $(ps --no-headers -o spid -T --pid $PARENT_PID); do
  echo "Dumping $child_pid"
  sudo strace -p $child_pid -e trace=read,write -e verbose=file -o strace_${SNAPSHOT_DATE}_${child_pid}_output.log &
  dumper_pid=$!
  echo "Dumper PID=$dumper_pid"
  sleep 5
  kill -9 $dumper_pid
done
This is a sample result of the script:
strace_1716153738_18418_output.log
write(188, "\364a\4\f\322\7\0\255\303n%\215*&\274\205m\221\312Bd\313\222&\367\360%\355F\305\2335"..., 32768) = 32768
read(836, "\v38\240\253,\310A\325\260\24\274j<\367V)\30t\263\31\311Q}\233\277\251\344\351f\36\371"..., 8192) = 8192
write(188, "\21-\214\216{\233\234\"{\"\31\263\272\233\255\236\240[\270KT\224\n\240\200\265\337Y\250\330\362\230"..., 3137) = 3137
read(836, "\6\335\366\341\327e\241\223\331\313\341F\316\353y\254\34\264\205\355_[7\221:\214\365\205~\377\0\251"..., 8192) = 8192
read(836, "\377\377\377\377\377\377\377\377\6\335\366\341\327e\241\223\331\313\341F\316\353y\254\34\264\205\355_[7\221"..., 8192) = 8192
read(836, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 8192
read(836, "\0\0\0\0\0\0\0\0\0\0\0\0\0LK@\0\0\0\0\0\0\0\0\0\0\0\0\202\257ID"..., 8192) = 8192
read(836, "\361\331=\256k\222\331@\204\255\36\4!\303J\274c\37\260\246\363\243d\360!\277L\306\34\1\0\0"..., 8192) = 8192
read(836, ":\214\365\205~\377\0\251\0\0\0\0\0\0\0\0\2042\4J\240\362\22D \341\327\221\323\351\4L"..., 8192) = 8192
read(836, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0t\302L\306\34\1\0\0"..., 8192) = 8192
read(836, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0"..., 8192) = 8192
read(836, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 8192
read(836, "\0\341\365\5\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 8192
read(836, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 8192
Next, we need an analyzer to aggregate the information gathered by the script:
agg_strace.py
from collections import Counter
import sys
import re
import os


PATTERN = re.compile(r'\((\d+),.*=\s+(\d+)$')
MONITORING_INTERVAL_SEC = 5


def resolve_fd(pid, fd):
    return os.path.realpath(f'/proc/{pid}/fd/{fd}')



def main():
    validator_pid = sys.argv[1]
    files = sys.argv[2:]
    activity_counter = Counter()
    for file in files:
        with open(file) as f:
            for line in f:
                line = line.strip()
                if m := PATTERN.search(line):
                    fd, op_sz = m.group(1), int(m.group(2))
                    activity_counter[fd] += op_sz
    for fd, counter in activity_counter.most_common():
        print(resolve_fd(validator_pid, fd), counter / MONITORING_INTERVAL_SEC, 'b/s')


if __name__ == '__main__':
    main()
Here is the report we generated.
report.txt
/mnt/raid0/rocksdb/049723.sst 188032689.2 b/s
/mnt/raid0/rocksdb/049630.sst 70089117.2 b/s
/mnt/raid0/rocksdb/049769.sst 47659331.4 b/s
/mnt/raid0/rocksdb/049688.sst 2815851.6 b/s
/mnt/raid0/rocksdb/049732.sst 2224991.8 b/s
/mnt/raid0/rocksdb/049621.sst 2077543.8 b/s
/mnt/raid0/rocksdb/049622.sst 1602355.2 b/s
/mnt/raid0/rocksdb/049761.sst 1486883.2 b/s
/mnt/raid0/rocksdb/049763.sst 1452476.2 b/s
/proc/17703/fd/2184 883151.4 b/s
/mnt/raid0/rocksdb/049749.sst 353938.6 b/s
/proc/17703/fd/2171 341655.6 b/s
/proc/17703/fd/2198 294958.0 b/s
/mnt/raid0/rocksdb/049748.sst 191726.2 b/s
/mnt/raid0/rocksdb/049611.sst 144179.2 b/s
/mnt/raid0/rocksdb/049651.sst 108972.6 b/s
/mnt/raid0/rocksdb/049641.sst 88473.6 b/s
/home/linuxuser/solana.log 38537.6 b/s
/mnt/raid0/rocksdb/049770.log 9198.2 b/s
/mnt/raid0/rocksdb/049733.sst 574.2 b/s
/mnt/raid0/rocksdb/049699.sst 435.6 b/s
/mnt/raid0/rocksdb/049606.sst 151.4 b/s
/proc/17703/fd/socket:[216600004] 136.4 b/s
/proc/17703/fd/2035 59.8 b/s
/proc/17703/fd/2199 27.2 b/s
/proc/17703/fd/anon_inode:[eventfd] 22.4 b/s
/proc/17703/fd/anon_inode:[eventfd] 19.2 b/s
/proc/17703/fd/anon_inode:[eventfd] 17.6 b/s
/proc/17703/fd/2200 3.6 b/s
From our analysis, it shows that the ledger, which is essentially a list of transactions stored in SST files, occupies the majority of I/O operations. Based on these findings, we don't see a compelling reason to avoid RAID-0, as there are no better options to distribute the load between different folders.
Close
Ready to take your business to the next level? Let's talk
I agree to the Terms of Service