Posts

Showing posts from December, 2017

cat abuse with split

# cat gets used all the time just to pipe the contents of a # file to stdout. # But it is actually for con cat enating files. # It has a partner in crime called split. # Working together they are very powerful for parallel processing. # split - do work in parallel - cat # split then cat will produce out output file which is identical to the input: # Make a 1 gig file of random bytes on my external ssd. time head -c $(( 1024 * 1024 * 1024 )) /dev/urandom > $(mktemp '/DataSwap/big.XXXXXXXX') real    0m5.738s user    0m0.057s sys    0m5.680s # Yey - that was fast! [aturner@Alexanders-MBP ~]$ split -b $(( 1024 * 1024)) /DataSwap/big.* '/DataSwap/parts' [aturner@Alexanders-MBP ~]$ ls /DataSwap/parts* /DataSwap/partsaa  /DataSwap/partsgp  /DataSwap/partsne  /DataSwap/partstt    /DataSwap/partszabi  /DataSwap/partszahx /DataSwap/partsab  /DataSwap/partsgq  /DataSwap/partsnf  /DataSwap/partstu    /DataSwap/partszabj  /DataSwap/partszahy /DataSwap/partsac  /DataS

Mass Deleting With split and map

# First off - this is how I got into the mess in the first place. # Step 1: make a really big random file: head -c $(( 1024 * 1024 * 1024 )) /dev/urandom > $(mktemp '/DataSwap/big.XXXXXXXX') # Step 2: Screw up and split it into a million (actually 1024*1024) # separate 1024byte files. split -b 1024 /DataSwap/big.KFGgJs8S '/DataSwap/parts' # Trying to delete them normally just fails because the command line is two long. # This seems to be about as fast as I can get, using a whole # bunch of parallel deleters. # First make separate files of 1000 entries to remove and put in shared # memory for speed (linuxisum). ls | split - '/dev/shm/lses'  # Now make a function which can read a block and delete all # the files listed. function rm_block { for f in $(cat $1); do rm "/DataSwap/$f"; done; } # A wrapper to easily put that in the background. function rm_block_bg { rm_block $1 & } # Now kick off the delete: ls /dev/shm/ls* |

bash 'header' files

# So you want to load a set of library functions but not # constantly reload if they are already loaded? # # For example:   if [[ ! -n $__UTILS_LOADED__ ]] then function print { local line="$@"; printf "%s\n" "$line"; } function map { local l; while read -r l; do $1 $l; done; } print '*** UTILS LOADED ***' __UTILS_LOADED__=TRUE fi # Now you can put... source /some/path/to/lib/utils.sh # ...where ever you want.

Parsing Columns From Files WITHOUT awk

# awk is cool - but sometimes jumping from bash to awk to bash gets clunky. # We don't actually have use awk - we can just leverage bash internal parsing. # This is a classic awk example but now we have the really useful function map # which lets us do the same time in bash. function size_name {     print $5 $9 } function map {     local l;     while read -r l; do         $1 $l;     done } ls -l SonicField/src/cpp/lib/ | map size_name | column -t 168       build.sh 2108      stream.hpp 15676048  stream.hpp_out # Let's try this as a one liner: _tmp () { print $5 $9; }; ls -l SonicField/src/cpp/lib/ | map _tmp  | column -t # awk is simpler but not by much if we assume map is part of your utils ls -l SonicField/src/cpp/lib/ | awk '{print $5, $9}'  | column -t 168       build.sh 2108      stream.hpp 15676048  stream.hpp_out # But remember that awk is a separate process space so you do not  # have access to the bash state in the same way. For example: