Bucket.py 9.87 KB
Newer Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
"""
This module is responsible for everything concerning file bucket bridge 
distribution. File bucket bridge distribution means that unallocated bridges 
are allocated to a certain pseudo-distributor and later written to a file.

For example, the following is a dict of pseudo-distributors (also called 
'bucket identifiers') with numbers of bridges assigned to them:

        FILE_BUCKETS = { "name1": 10, "name2": 15, "foobar": 3 }

This configuration for buckets would result in 3 files being created for bridge
distribution: name1-2010-07-17.brdgs, name2-2010-07-17.brdgs and 
foobar-2010-07-17.brdgs. The first file would contain 10 bridges from BridgeDB's
'unallocated' pool. The second file would contain 15 bridges from the same pool
and the third one similarly 3 bridges. These files can then be handed out to 
trusted parties via mail or fed to other distribution mechanisms such as 
twitter.

Note that in BridgeDB slang, the _distributor_ would still be 'unallocated',
even though in the database, there would now by 'name1', 'name2' or 'foobar'
instead of 'unallocated'. This is why they are called pseudo-distributors.
"""

import time
import bridgedb.Storage

27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
# What should pseudo distributors be prefixed with in the database so we can
# distinguish them from real distributors?
PSEUDO_DISTRI_PREFIX = "pseudo_"

def getRealDistributorName(distributor):
    """Return the *real* ring name for a given one. This is needed because
       with pseudo distributors, we've got strings in the database that aren't
       real distributors. 
    """

    # If it starts with "pseudo_", its really "unallocated"
    if distributor.startswith(PSEUDO_DISTRI_PREFIX):
        distributor = "unallocated"

    return distributor

43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
class BucketData:
    """A file bucket value class.
       name      - Name of the bucket (From config)
       needed    - Needed number of bridges for that bucket (From config)
       allocated - Number of already allocated bridges for that bucket
    """
    def __init__(self, name, needed):
        self.name = name
        if needed == "*":
            # Set to rediculously high number
            needed = 1000000
        self.needed = int(needed)
        self.allocated = 0

class BucketManager:
    """BucketManager reads a number of file bucket identifiers from the config.
       They're expected to be in the following format:

       FILE_BUCKETS = { "name1": 10, "name2": 15, "foobar": 3 }

       This syntax means that certain buckets ("name1", "name2" and so on)
       are given a number of bridges (10, 15 and so on). Names can be anything.
       The name will later be the prefix of the file that is written with the
       assigned number of bridges in it. Instead of a number, a wildcard item
       ("*") is allowed, too. This means that the corresponsing bucket file 
       will get the maximum number of possible bridges (as many as are left in 
       the unallocated bucket).

       The files will be written in ip:port format, one bridge per line.

       The way this works internally is as follows:

       First of all, the assignBridgesToBuckets() routine runs through
       the database of bridges and looks up the 'distributor' field of each 
       bridge. Unallocated bridges are sent to a pool for later assignement.
       Already allocated bridges for file bucket distribution are sorted and 
       checked.
       They're checked for whether their bucket identifier still exists in the 
       current config and also whether the number of assigned bridges is still 
       valid. If either the bucket identifier is not existing anymore or too 
       many bridges are currently assigned to it, bridges will go to the 
       unassigned pool.

       In the second step, after bridges are sorted and the unassigned pool is
       ready, the assignBridgesToBuckets() routine assigns one bridge
       from the unassigned pool to a known bucket identifier at a time until it
       either runs out of bridges in the unallocated pool or the number of
       needed bridges for that bucket is reached.

       When all bridges are assigned in this way, they can then be dumped into
       files by calling the dumpBridges() routine.
Christian Fromme's avatar
Christian Fromme committed
94
95
96
97
98
99
100
101
102
103
104
105
106

       cfg                      - The central configuration instance
       bucketList               - A list of BucketData instances, holding all 
                                  configured (and thus requested) buckets with
                                  their respective numbers
        unallocatedList         - Holding all bridges from the 'unallocated' 
                                  pool
        unallocated_available   - Is at least one unallocated bridge 
                                  available?
        distributor_prefix      - The 'distributor' field in the database will
                                  hold the name of our pseudo-distributor, 
                                  prefixed by this 
        db                      - The bridge database access instance
107
108
109
110
111
112
113
    """

    def __init__(self, cfg):
        self.cfg = cfg
        self.bucketList = []
        self.unallocatedList = []
        self.unallocated_available = False
114
        self.distributor_prefix = PSEUDO_DISTRI_PREFIX
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
        self.db = bridgedb.Storage.Database(self.cfg.DB_FILE+".sqlite",
                                            self.cfg.DB_FILE)

    def __del__(self):
        self.db.close()

    def addToUnallocatedList(self, hex_key):
        """Add a bridge by hex_key into the unallocated pool
        """
        try:
            self.db.updateDistributorForHexKey("unallocated", hex_key)
        except:
            self.db.rollback()
            raise
        else:
            self.db.commit()
        self.unallocatedList.append(hex_key)
        self.unallocated_available = True

Christian Fromme's avatar
Christian Fromme committed
134
    def getBucketByIdent(self, bucketIdent):
Christian Fromme's avatar
Christian Fromme committed
135
136
        """Do we know this bucket identifier? If yes, return the corresponding
           BucketData object.
137
138
139
140
141
142
143
144
145
146
        """
        for d in self.bucketList:
            if d.name == bucketIdent:
                return d
        return None

    def assignUnallocatedBridge(self, bucket):
        """Assign an unallocated bridge to a certain bucket
        """
        hex_key = self.unallocatedList.pop()
Christian Fromme's avatar
Christian Fromme committed
147
148
149
        # Mark pseudo-allocators in the database as such
        allocator_name = self.distributor_prefix + bucket.name
        #print "KEY: %d NAME: %s" % (hex_key, allocator_name)
150
        try:
Christian Fromme's avatar
Christian Fromme committed
151
            self.db.updateDistributorForHexKey(allocator_name, hex_key)
152
153
        except:
            self.db.rollback()
Christian Fromme's avatar
Christian Fromme committed
154
155
156
157
            # Ok, this seems useless, but for consistancy's sake, we'll 
            # re-assign the bridge from this missed db update attempt to the
            # unallocated list. Remember? We pop()'d it before.
            self.addToUnallocatedList(hex_key)
158
159
160
            raise
        else:
            self.db.commit()
Christian Fromme's avatar
Christian Fromme committed
161
        bucket.allocated += 1
162
163
164
165
166
167
168
169
170
171
172
173
174
        if len(self.unallocatedList) < 1:
            self.unallocated_available = False
        return True

    def assignBridgesToBuckets(self):
        """Read file bucket identifiers from the configuration, sort them and 
           write necessary changes to the database
        """
        # Build distributor list
        for k, v in self.cfg.FILE_BUCKETS.items():
            d = BucketData(k, v)
            self.bucketList.append(d)

Christian Fromme's avatar
Christian Fromme committed
175
        # Loop through all bridges and sort out distributors
176
177
178
179
180
181
        allBridges = self.db.getAllBridges()
        for bridge in allBridges:
            if bridge.distributor == "unallocated":
                self.addToUnallocatedList(bridge.hex_key)
                continue

Christian Fromme's avatar
Christian Fromme committed
182
183
            # Return the bucket in case we know it already
            d = self.getBucketByIdent(bridge.distributor)
184
            if d is not None:
Christian Fromme's avatar
Christian Fromme committed
185
186
                # Does this distributor need another bridge? If not, re-inject
                # it into the 'unallocated' pool for for later assignment
187
188
189
190
191
192
                if d.allocated < d.needed:
                    d.allocated += 1
                else:
                    self.addToUnallocatedList(bridge.hex_key)
            # We don't know it. Maybe an old entry. Free it.
            else:
Christian Fromme's avatar
Christian Fromme committed
193
194
                # DON'T free anything important!
                if bridge.distributor.startswith(self.distributor_prefix):
195
                    self.addToUnallocatedList(bridge.hex_key)
Christian Fromme's avatar
Christian Fromme committed
196
197
                # else 
                #   SCREAM_LOUDLY? 
198
199
200
201
202
203
204
205
206
207
208
209
210

        # Loop though bucketList while we have and need unallocated 
        # bridges, assign one bridge at a time
        while self.unallocated_available and len(self.bucketList) > 0:
            for d in self.bucketList:
                if d.allocated < d.needed:
                    if not self.assignUnallocatedBridge(d):
                        print "Couldn't assign unallocated bridge to %s" % d.name
                else:
                    # When we have enough bridges, remove bucket identifier 
                    # from list
                    self.bucketList.remove(d)

Christian Fromme's avatar
Christian Fromme committed
211
212
213
214
215
216
217
218
219
220
221
    def dumpBridgeToFile(self, bridge, filename):
        """Dump a given bridge into a given file
        """
        try:
            f = open(filename, 'a')
            line = "%s:%s" % (bridge.address, bridge.or_port)
            f.write(line + '\n')
            f.close()
        except IOError:
            print "I/O error: %s" % filename
         
222
223
224
    def dumpBridges(self):
        """Dump all known file distributors to files
        """
Christian Fromme's avatar
Christian Fromme committed
225
226
227
        allBridges = self.db.getAllBridges()
        for bridge in allBridges:
            if bridge.distributor is "":
228
                continue
Christian Fromme's avatar
Christian Fromme committed
229
230
231
232
233
234
235
236
            distributor = bridge.distributor
            if (distributor.startswith(self.distributor_prefix)):
                # Subtract the pseudo distributor prefix
                distributor = distributor.replace(self.distributor_prefix, "")
            # Be safe. Replace all '/' in distributor names
            distributor = distributor.replace("/", "_")
            fileName = distributor + "-" + time.strftime("%Y-%m-%d") + ".brdgs"
            self.dumpBridgeToFile(bridge, fileName)