Showing posts with label python. Show all posts
Showing posts with label python. Show all posts

2021-12-31

String Associative Array and CombSort Benchmark 2021 Edition

Last year, we've done string associative benchmark and lesser string associative benchmark (measuring string concat operation and built-in associative array set and get), numeric comb sort benchmark and string comb sort benchmark (measuring basic array random access, string conversion, and array swap for number and string), this year's using newer server: 32-core running on 64-bit Ubuntu 21.10. This time we will skip programming languages that are no deb packages (unless the install script is just one line and doesn't ruin system directories) or no direct compile-run command like previous one, also only best of 3 runs.

$ alias time='/usr/bin/time -f "\nCPU: %Us\tReal: %es\tRAM: %MKB"

This time, NodeJS failed to complete (after waiting for an hour) with 10x more data compared to last year for assoc benchmark. Here's the spreadsheet and final result (Real duration and RAM):

LanguageCommand FlagsVersionAssocRAMNum CombRAMStr CombRAMTotalRAM
Gogo rungo1.17.512.392,305,8240.4683,0963.33245,89616.182,634,816
Javajava18-ea+15-Ubuntu-410.705,582,3080.96170,2406.93722,15218.596,474,700
Nimnim r -d:release --gc:arc1.4.217.904,200,2121.2179,4445.45633,81624.564,913,472
Pythonpypy7.3.519.703,104,4402.17140,1245.73523,26427.603,767,828
Ctcc -run0.9.2722.682,820,5681.0180,4844.80392,89628.493,293,948
Juliajulia1.6.523.983,714,4480.52255,8444.60861,07629.104,831,368
Vv -prod run0.2.4 a0a180720.368,910,8562.67124,2486.48470,13629.519,505,240
Lualuajit2.1.0-beta316.701,133,8763.27133,50410.80511,46830.771,778,848
Dartdart2.15.125.922,101,1521.40208,0565.75574,85233.072,884,060
Crystalcrystal run --release1.2.213.772,371,3207.73202,55212.08440,74833.583,014,620
Nimnim r -d:release1.4.224.923,864,1001.7679,5487.511,211,20034.195,154,848
PHPphp8.0.89.941,368,8087.74328,34424.84641,44842.522,338,600
Crystalcrystal run0.35.138.852,372,0049.97179,17622.27441,72071.092,992,900
Vv run0.2.4 a0a180751.818,911,0046.6379,71618.43470,42076.879,461,140
Pythonpython33.9.743.114,106,89229.16405,33243.08722,996115.355,235,220
Nimnim r1.4.288.053,864,0482.9379,53632.601,211,260123.585,154,844
Rubyrubyruby 2.7.4p19152.482,970,90827.15100,32052.23708,940131.863,780,168
Javascriptnodev16.13.1999.999,999,9990.71115,0446.11461,8361006.8110,576,879

FAQ

1. Why you measure the compile duration too? because developer experience also important (feedback loop, edit-rebuild/compile-run), at least for me, it would be sucks a lot if we have to wait a minute to compile before we can test something. We could always write precalculated values with C++ template to make runtime faster for example, but the compilation delay would be very sucks.
2. Why not warming up the VM first? each implementation have it's own advantage and disadvantage. We already know, that compiled language mostly faster at runtime, but at cost of relatively slower development feedback loop. Interpreted language mostly slower at runtime, especially if executed using VM that have startup overhead, in exception of one with AOT or JIT optimization. So to make it fair for every kind of implementation, we do it differently by not glorifying the runtime performance (which super make sense for server or long-lived process, but not best for development or CI cost which people often neglected), but by total performance which consist of Compile duration (if any) + VM startup duration (if any) + AOT or JIT duration (if any) + Runtime duration, so every strategy the PL's implementator use can be fairly judged.
3. Why there's no C++, VB.NET, C#, D, Object-Pascal? don't want to compile things (since there's no build and run command in one flag).  
4. Why there's no Kotlin, Scala, Rust, Elixir, Pony, Swift, Groovy, or Zig? Too lazy to add :3 you can contribute tho (create a pull request, then I'll run the benchmark again as preferably as there's precompiled binary/deb/apt/ppa repository for the compiler/interpreter).
5. Why there's no Ruby 3.1? I can't find any PPA for latest Ruby, latest one on Ubuntu 21.10 repo is Ruby2.7.

Contributorsilmanzo (Nim, Crystal, D), inkydragon (Julia)

2021-03-13

Pyroscope: Continuous Tracing in Go, Python, or Ruby

Recently I stumbled upon slow library/function problem and don't know chich part that causes it, and found out that there's a easy way to trace either Go, Ruby, or Python code using Pyroscope. The feature is a bit minimalist, there's no memory usage tracing yet unlike in gops or pprof. Pyroscope consist 2 parts: the server and the agent/client library (if using Golang) or executor (if using Ruby or Python). Here's the way how to run and start Pyroscope server:

# run server using docker
docker run -it -p 4040:4040 pyroscope/pyroscope:latest server

And here's the example on how to use the client library/agent (modifying Go's source code, just like in DataDog or any other APM tools) and install  the Pyroscope CLI to run Ruby/Python scripts:

# golang, add agent inside the source code
import "github.com/pyroscope-io/pyroscope/pkg/agent/profiler"
func main() {
  profiler.Start(profiler.Config{
    ApplicationName: "my.app.server", 
    ServerAddress:   "http://pyroscope:4040",
  })
  // rest of your code
}

# ruby or python, install CLI client 
cd /tmp
wget https://dl.pyroscope.io/release/pyroscope_0.0.28_amd64.deb
sudo apt-get install ./pyroscope_0.0.28_amd64.deb

# ruby
pyroscope exec ruby yourcode.rb

# python
pyroscope exec python yourcode.py

It would show something like this if you open the server URL (localhost:4040) in the browser, so you can check which part of the code that took most of the runtime.




2021-01-15

Learn Python in X minutes

Too bad that Python become increasingly more popular now, the fastest implementation (PyPy which nearly as fast as Dart > Java and TCC > Crystal and NodeJS > Nim > Go) somehow not fully compatible with the default implementation (CPython which slower than CRuby > Lua > PHP >> LuaJIT and V > PyPy). Haven't really learned about Python (and Lua) in depth (last time I learned it is about 8 years ago, same as RubyOnRails), since most of my projects can be covered with Go (backend), Ruby (scripting), Javascript (frontend), C# (games). Probably the biggest motivation to learn Python is data science (there's a bunch of libraries binding), also for LuaJIT is possibly the fastest embeddable language (FFI) that could bind to C easily compared to another language and bunch of game engines that uses Lua (also have you heard Core? it's Unity3D like game engine, that have low visibility). But this article is not about Lua, so now we'll try to learn Python, the most used stuff only (one that used in competitive programming):

Variable and data types (int, long, string, float, bool) and structures (list, tuple, dict, set):

v1 = 1 # int or long (bigint) data type
help(v1) # list of methods and documentations
v1 = 'str' # string data type
dir(v1) # list of methods as array of string
v1 = 1.2 # float
print(v1.__doc__) # get documentation of current type or method
type(v1) == float

v1 = [1,2,'test'] # list data type
v1[-1] == 'test' # True (bool data type)
v1[::2] == [1,'test'] # step each 2
v1[:2] == [1,2] # from beginning until before index 2
v1[1:] == [2,'test'] # from index 1 until the end
range(3) == [0,1,2] # generate from range
range(1,7,2) == [1,3,5] # jump every 2
del v1[1] # delete at pos 1
any([True,1,2,3]) == True
any(['',0,False]) == False
sum([1,2,3]) == 6
v1 = [1,2,3]
v1.append(4) # changes v1 to [1,2,3,4]
v2 = v1[:] # deep copy
v1.pop() # 4
v1.remove(3) # remove first occurence
v1.insert(1,99) # changes v1 to [1,99,2]
v1.index(99) # first 99 found at position 1
v1 + v2 # [1,99,2,1,2,3,4]
v1.extend(v2) # updates v1 to ^
len(v1) # 7
i = iter(v1)
next(i) # 1
v1 = [3,2,1]
v1.sort() # v1 now [1,2,3]

v1 = (1,'test') # tuple data type
tuple([1,2,'test']) == (1,2,'test') # convert to tuple
list((1,2,'test')) == [1,2,'test'] # convert to list
v1[0] == 1 # can be indexed, but readonly
v1 += (2,) # (1,'test',2)

v1 = {'a':1, 2:3} # dict data type
list(v1) == v1.keys()
v1.values() == [1,3]
v1[4] = 5 
3 in v1 # False, because no key 3 in v1
v1.get(99) == None 
del v1['a']

v1 = set()
v1 = {1,2,2,1,1,3} # {1,2,3}
v1.add(3) # no effect
v1 & {1,2} # intersection
v1 | {3,4} # union
v1 - {1,2} # difference
v1 ^ {1,4} # symetric difference
v1 >= {1,2} # True if right all included in v1
v1.copy() # deep copy

Operators (arithmetic, comparison, logical, assignment, bitwise, identity/is, membership/in):

v1 / 2 # float division
v1 // 2 # integer division
v1 ** 3 # exponentiation
not (v1 > 3) # bool
v1 != 2 and v1 < 5 # bool
v1 != 2 or v1 < 5 # bool
v1 is 3 # only true when referring to same object
3 is 3 # variable with const always false
'test' + '123' # concat


v1 = '%s %d %.2f' % ('yay',1,1.2345)
v1 == 'yay 1 1.23'

v1 = '%(name)s age is %(age).2f' % {'name':'kis', 'age':34.5}
v1 == 'kis age is 34.50'

v1 = """
multi line {}
""".format('string')
v1 == '\nmulti line string\n'

'%4s' % 'a' == '   a' # 3 space before a, no need for tuple if one

int('2') == 2
float('2.34') == 2.34
str(123.456) == '123.456'
bool(0) # False, also False for '', [], {}, ()
print('yay',end='') # without newline

Control structure (if, for, while):

if 's' in v1:
  print('inside')
elif '\n' in v1:
  print('have newline')
else:
  print('not having s or newline')

for v in v1: # only key if dict, can be iter
  print(v) # can use continue or break
else:
  print('only called when no break')

while True:
  break

# list comprehension
[x * x for x in [1,2,3]] == [1,4,9]
[x * y for x in [1,2,3] for y in [10,20]] == [10,20,30,40,50,60]

# dict comprehension
{x:1 for x in 'abcdef' if x not in 'acf'} == {'b':1,'d':1,'e':1}

Function and lambda:

d = 0

def f1(a,b=2,c='c'):
  global d # if need to modify global variable
  d += 1
  print(a,b,c,d) # print as tuple

f1([],3) # ([],3,'c',1)

f1 = lambda(y): y + 'test'
print(f1('a')) # atest

filter(lambda(x): x%2 == 0, [1,2,3,4,5]) == [2,4]

def f2(*a):
  print(a)

f2(1,2,'a') # (1,2,'a')

def f3(**a):
  print(a)

f3(x=1,y=2) # {'x':1,'y':2}

v1 = ['a','foo','ca']
v1.sort(key=lambda(x): len(x)) # sort by length

def f4(x):
  return -len(x)

v1.sort(key=f4) # without lambda

Class and object:

class Human(object):
  static_var = 10
  def __init__(self):
    self.name = 'kis'
  def set_age(self,a):
    self.age = a
  def get_both(self):
    return (self.name,self.age)
  def static_method():
    return 'whoa'

h = Human()
h.set_age(34)
print(h.get_both())
h.address = 'dzf 1/23' # create new member
del h.address 

Human.static_var = 3 # all object points to this new value
h.static_var = 4 # only this object points to this new value
h2 = Human()
h2.static_var == 3 # last time changed from 10 to 3

Importing libraries:

import random
from time import clock
random.seed(clock())
v1 = random.randint(1,10) # inclusive 1 to 10
print(v1)

import math
math.sqrt(9) # 3.0

from math import ceil, floor
ceil(3.4) # 4.0
floor(3.9) # 3.0

import math as m
m.sqrt(4) # 2.0

from heapq import heappush, heappop, heapify
v1 = []
heappush(v1,4)
heappush(v1,6)
heappush(v1,5)
heappop(v1) # 4
heappop(v1) # 5
heappop(v1) # 6
v1 = [(4,'a'),(2,'b'),(1,'c'),(3,'d')]
heapify(v1) # heappop() 4 times will return sorted order

from bisect import bisect_left # C++'s lower_bound
from collections import deque
v1 = deque(v1)
v1.appendleft((1,'c'))
bisect_left(v1,(2,'b')) == 1 # returns nearest if not exists

class KV(object):
  def __init__(self, l, key):
    self.l = l
    self.key = key
  def __len__(self): # overrides len()
    return len(self.l)
  def __getitem__(self, index): # overrides []
    return self.key(self.l[idx])
  def get(self, idx):
    return l[idx]

v1 = 
KV(v1,lambda(x): x[0])
bisect_left(v1,3)

I think that's for now, I exclude try-except-else-finally (or catch all exception) and making a module in this tutorial.

2020-12-22

String Associative Array and CombSort Benchmark 2020 Edition

5 Years later since last string associative benchmark and lesser string associative benchmark (measuring string concat operation and built-in associative array set and get), numeric comb sort benchmark and string comb sort benchmark (measuring basic array random access, string conversion, and array swap for number and string), this year's using newer processor: AMD Ryzen 3 3100 running on 64-bit Ubuntu 20.04. Now with 10x more data to hopefully make the benchmark runs 10x slower (at least 1 sec), best of 3 runs.

alias time='/usr/bin/time -f "\nCPU: %Us\tReal: %es\tRAM: %MKB"'

$ php -v
PHP 7.4.3 (cli) (built: Oct  6 2020 15:47:56) ( NTS )

$ time php assoc.php 
637912 641149 67002
3808703 14182513 2343937
CPU: 1.25s      Real: 1.34s     RAM: 190644KB

$ python3 -V
Python 3.8.5

$ time python3 dictionary.py
637912 641149 67002
3808703 14182513 2343937
CPU: 5.33s      Real: 5.47s     RAM: 314564KB

$ ruby3.0 -v
ruby 3.0.0p0 (2020-12-25 revision 95aff21468) [x86_64-linux-gnu]

$ time ruby3.0 --jit hash.rb 
637912 641149 67002
3808703 14182513 2343937
CPU: 6.50s      Real: 5.94s     RAM: 371832KB

$ go version
go version go1.14.7 linux/amd64

$ time go run map.go
637912 641149 67002
3808703 14182513 2343937
CPU: 1.79s      Real: 1.56s     RAM: 257440KB

$ node -v       
v14.15.2

$ time node object.js
637912 641149 67002
3808703 14182513 2343937
CPU: 2.24s      Real: 2.21s     RAM: 326636KB

$ luajit -v 
LuaJIT 2.1.0-beta3 -- Copyright (C) 2005-2017 Mike Pall. http://luajit.org/

$ time luajit table.lua
637912  641149  67002
3808703 14182513        2343937
CPU: 4.11s      Real: 4.22s     RAM: 250828KB

$ dart --version
Dart SDK version: 2.10.4 (stable) (Unknown timestamp) on "linux_x64"

$ time dart map.dart
637912 641149 67002
3808703 14182513 2343937
CPU: 2.99s      Real: 2.91s     RAM: 385496KB

$ v version
V 0.2 36dcace

$ time v run map.v
637912, 641149, 67002
3808703, 14182513, 2343937
CPU: 4.79s      Real: 5.28s     RAM: 1470668KB

$ tcc -v
tcc version 0.9.27 (x86_64 Linux)

$ time tcc -run uthash.c
637912 641149 67002
3808703 14182513 2343937
Command exited with non-zero status 25

CPU: 2.52s      Real: 2.61s     RAM: 291912KB

export GOPHERJS_GOROOT="$(go1.12.16 env GOROOT)"
$ npm install --global source-map-support

$ goperjs version
GopherJS 1.12-3

$ time gopherjs 
637912 641149 67002
3808703 14182513 2343937

CPU: 14.13s     Real: 12.01s    RAM: 597712KB

$ java -version
java version "14.0.2" 2020-07-14
Java(TM) SE Runtime Environment (build 14.0.2+12-46)
Java HotSpot(TM) 64-Bit Server VM (build 14.0.2+12-46, mixed mode, sharing)

$ time java hashmap.java
637912 641149 67002
3808703 14182513 2343937

CPU: 5.18s      Real: 1.63s     RAM: 545412KB

The result shows a huge improvement for PHP since the old 5.4. NodeJS also huge improvement compared to old 0.10. The rest is quite bit the same. Also please keep note that Golang and V includes build/compile time not just run duration, and it seems V performance really bad when it comes to string operations (the compile itself really fast, less than 1s for 36dcace -- using gcc 9.3.0). 
Next we're gonna benchmark comb sort implementation. But this time we use jit version of ruby 2.7, since it's far way faster (19s vs 26s and 58s vs 66s for string benchmark), for ruby 3.0 we always use jit version since it's faster than non-jit. In case for C (TCC) which doesn't have built-in associative array, I used uthash, because it's the most popular. TinyGo does not complete first benchmark after more than 1000s, sometimes segfault. XS Javascript engine failed to give correct result, engine262 also failed to finish within 1000s.

LanguageCommand FlagsVersionAssocRAMNum CombRAMStr CombRAMTotalRAM
Gogo run1.14.71.56257,4400.7382,8444.74245,4327.03585,716
Gogo run1.15.61.73256,6200.7882,8964.86245,4687.37584,984
Nimnim r -d:release --gc:arc1.4.21.56265,1720.7979,2845.77633,6768.12978,132
Nimnim r -d:release --gc:orc1.4.21.53265,1600.9479,3805.83633,6368.30978,176
Javascriptnode14.15.22.21327,0480.87111,9726.13351,5209.21790,540
Crystalcrystal run --release0.35.11.81283,6481.44146,7006.09440,7969.34871,144
Javascript~/.esvu/bin/v88.9.2011.77177,7480.89105,4166.71335,2369.37618,400
Ctcc -run0.9.272.61291,9121.4580,8326.40393,35210.46766,096
Javajava14.0.2 2020-07-141.63545,4121.50165,8647.69743,57210.821,454,848
Nimnim r -d:release1.4.21.91247,4560.9679,4768.381,211,11611.251,538,048
Dartdart2.10.42.91385,4961.61191,9167.31616,71611.831,194,128
Pythonpypy7.3.1+dfsg-22.19331,7762.83139,7408.04522,64813.06994,164
Javascript~/.esvu/bin/chakra1.11.24.02.73487,4001.27102,19211.27803,16815.271,392,760
Javascript~/.esvu/bin/jsc2711175.90593,6240.68111,9729.09596,08815.671,301,684
Vv -prod run0.2 32091dd gcc-10.24.781,469,9321.8679,37614.061,560,51620.703,109,824
Lualuajit2.1.0-beta34.11250,8283.76133,42412.91511,19620.78895,448
Javascript~/.esvu/bin/smJavaScript-C86.0a15.61378,0641.4096,48013.81393,37620.82867,920
Vv -prod run0.2 32091dd gcc-9.35.051,469,9362.1479,40814.621,560,48421.813,109,828
Javascript~/.esvu/bin/graaljsCE Native 20.3.07.78958,3804.45405,90014.31911,22026.542,275,500
Gogopherjs run1.12-3 (node 14.15.2)11.76594,8962.04119,60418.46397,39632.261,111,896
Nimnim r1.4.26.60247,4443.0579,33231.851,211,20841.501,537,984
PHPphp7.4.31.34190,64410.11328,45234.51641,66445.961,160,760
Rubytruffleruby21.1.0-dev-c1517c5514.542,456,1563.09453,15229.273,660,28446.906,569,592
Crystalcrystal run0.35.15.69284,32812.00153,82831.69441,74049.38879,896
Javascript~/.esvu/bin/quickjs2020-11-083.90252,48423.4880,77234.80471,62462.18804,880
Vv run0.2 36dcace gcc-9.35.281,470,6686.6080,23258.991,561,17670.873,112,076
Lualua5.3.35.98366,51627.26264,64846.05864,30079.291,495,464
Rubyruby2.7.0p06.31371,45619.29100,53658.82694,56084.421,166,552
Pythonpython33.8.55.47314,56433.96404,97647.79722,82087.221,442,360
Rubyjruby9.2.9.07.451,878,18434.111,976,84459.837,115,448101.3910,970,476
Rubyruby3.0.0p05.94371,83224.8792,84474.321,015,096105.131,479,772
Gotinygo run0.16.0999.99318,1483.68300,548252.34711,3401256.011,330,036

Golang still the winner (obviously, since it's compiled), then Nim (Compiled), next best JIT or interpreter is NodeJS, Crystal (Compiled, not JIT), v8, followed Java, by TCC (Compiled) Dart, PyPy, V (Compiled, not JIT), LuaJIT, PHP, Ruby, and Python3. The recap spreadsheet can be accessed here

FAQ:
1. Why you measure the compile duration too? because developer experience also important (feedback loop), at least for me.
2. Why not warming up the VM first? each implementation have it's own advantage and disadvantage.
3. Why there's no C++, VB.NET, C#, D, Object-Pascal? don't want to compile things (since there's no build and run command in one flag).  
4. Why there's no Kotlin, Scala, Rust, Pony, Swift, Groovy, Julia, Crystal, or Zig? Too lazy to add :3 you can contribute tho (create a pull request, then I'll run the benchmark again as preferabbly as there's precompiled binary/deb/apt/ppa repository for the compiler/interpreter).

Contributors
ilmanzo (Nim, Crystal, D)