Russell Branca [Fri, 12 Apr 2013 20:48:14 +0000 (16:48 -0400)]
Moving shard maps _membership endpoint to _shards db handler
Russell Branca [Fri, 12 Apr 2013 19:06:58 +0000 (15:06 -0400)]
Add doc shard info endpoint
Russell Branca [Thu, 11 Apr 2013 18:18:12 +0000 (14:18 -0400)]
Fix _membership/$DBNAME api endpoint
This switches the JSON key to be a binary, as required by jiffy.
Also, remove extraneous <<"parts">> path from the url.
Show full shard range.
Paul J. Davis [Tue, 19 Mar 2013 03:57:46 +0000 (22:57 -0500)]
Update to use new multi rexi_server protocol
Russell Branca [Wed, 18 Jun 2014 22:04:35 +0000 (15:04 -0700)]
Handle the #doc_info case in changes_enumerator
This is to handle the special case where the user migrates a CouchDB
database to BigCouch and they have not yet compacted the
database. Once the database has been compacted, this #doc_info clause
should never be encountered.
Robert Newson [Tue, 3 Jun 2014 10:21:02 +0000 (11:21 +0100)]
Don't log when ensuring dbs exist
Robert Newson [Wed, 7 May 2014 13:48:25 +0000 (14:48 +0100)]
Add function to determine shard membership locally
mem3:belongs/2 allows you to determine if a given doc id belongs to a
given shard (whether a #shard{} record or just the filename of a
shard) without looking up the shard map or making any remote
calls.
Robert Newson [Wed, 12 Feb 2014 23:23:47 +0000 (23:23 +0000)]
Change API to function per level
Robert Newson [Wed, 12 Feb 2014 20:11:56 +0000 (20:11 +0000)]
Switch to couch_log
Paul J. Davis [Tue, 11 Feb 2014 07:54:37 +0000 (01:54 -0600)]
Add license headers
Robert Newson [Mon, 23 Dec 2013 16:55:10 +0000 (16:55 +0000)]
Add ejson_body to all mem3 open_doc attempts that need it
Robert Newson [Thu, 19 Dec 2013 18:16:58 +0000 (18:16 +0000)]
Remove references to margaret
Robert Newson [Wed, 18 Dec 2013 14:04:59 +0000 (14:04 +0000)]
Build with rebar
Robert Newson [Thu, 13 Jun 2013 12:42:11 +0000 (13:42 +0100)]
Fix up copyright headers
Paul J. Davis [Tue, 5 Mar 2013 23:55:26 +0000 (17:55 -0600)]
New build system for mem3
Paul J. Davis [Wed, 20 Mar 2013 10:04:53 +0000 (05:04 -0500)]
Remove Cloudant build system remnants
Adam Kocoloski [Thu, 7 Mar 2013 22:03:38 +0000 (14:03 -0800)]
Merge pull request #43 from cloudant/guard-against-empty-list
Guard against empty list
Robert Newson [Thu, 7 Mar 2013 21:29:35 +0000 (15:29 -0600)]
Guard against empty list
Rotating an empty list gives badarith so add a guard clause, since the
result of rotating an empty list is well known.
BugzID: 17801
Adam Kocoloski [Thu, 7 Mar 2013 20:17:36 +0000 (12:17 -0800)]
Merge pull request #42 from cloudant/17801-spread-the-pain
BugzID: 17801
Robert Newson [Wed, 6 Mar 2013 14:29:13 +0000 (08:29 -0600)]
Spread ushards load to more nodes
In some cases, notably q=1 databases, the current ushards algorithm
will always choose the same replica (because of the lists:sort and
order-preserving orddict). This causes a severely skewed load profile
if you have lots of these cases.
This patch rotates each group of nodes using the crc32 of the database
name, spreading out the load pretty evenly.
The patch is a little obscure because ushards still has remnants of
previous work (breaking nodes into the local, same zone, different
zone, but then deliberately merging local and same zone back together
because that was a silly idea).
BugzID: 17801
Adam Kocoloski [Thu, 28 Feb 2013 21:05:47 +0000 (16:05 -0500)]
Ignore other config changes
Adam Kocoloski [Wed, 27 Feb 2013 19:32:11 +0000 (11:32 -0800)]
Merge pull request #40 from cloudant/13179-refactor-config-registration
Use config app instead of couch_config
Adam Kocoloski [Wed, 27 Feb 2013 19:31:23 +0000 (14:31 -0500)]
Updated the tests too
Adam Kocoloski [Wed, 27 Feb 2013 02:18:13 +0000 (21:18 -0500)]
Use config app instead of couch_config
BugzID: 13179
Adam Kocoloski [Thu, 21 Feb 2013 14:16:52 +0000 (06:16 -0800)]
Merge pull request #38 from cloudant/17185-reduce-log-spam
Replace cache miss log with metrics
BugzID: 17185
Adam Kocoloski [Thu, 21 Feb 2013 14:14:31 +0000 (06:14 -0800)]
Merge pull request #39 from cloudant/15754-mem3-sync-backlog
Add an API for mem3_sync queue lengths
BugzID: 15754
Paul J. Davis [Thu, 21 Feb 2013 07:08:21 +0000 (01:08 -0600)]
Add an API for mem3_sync queue lengths
Paul J. Davis [Sun, 10 Feb 2013 21:30:49 +0000 (15:30 -0600)]
Replace cache miss log with metrics
This also adds metrics for cache hits and evictions as well as the miss.
Adam Kocoloski [Wed, 19 Dec 2012 16:07:35 +0000 (08:07 -0800)]
Merge pull request #36 from cloudant/13605-fix-shards-badmatch
Protect against cache_hits on non-existant entries
BugzID: 13605
Paul J. Davis [Tue, 11 Dec 2012 21:11:15 +0000 (15:11 -0600)]
Protect against cache_hits on non-existant entries
This can happen if we load a shard set from the cache and then eject the
shards before processing the cache_hit '$gen_cast' message. Instead of
trying to be fancy and reinserting the shards into the cache directly we
just rely on the fact that they'll be reinserted normally on the next
request.
Robert Newson [Thu, 6 Dec 2012 19:13:32 +0000 (11:13 -0800)]
Merge pull request #35 from cloudant/15924-dont-resurrect-on-delete
Don't resurrect shards on deletion
Robert Newson [Tue, 4 Dec 2012 19:42:05 +0000 (19:42 +0000)]
Don't resurrect shards on deletion
In bigcouch and bigcouchplus (at least) the deleted property is a
binary, <<"deleted">>, and not an atom, deleted, as it used to
be. This causes mem3_shards to recreate a shard immediately after it
is deleted, leading to profound silliness.
BugzID: 15924
Adam Kocoloski [Mon, 1 Oct 2012 20:10:01 +0000 (13:10 -0700)]
Merge pull request #33 from cloudant/11602-sync-security
BugzID: 11602
Adam Kocoloski [Mon, 1 Oct 2012 18:51:37 +0000 (11:51 -0700)]
Merge pull request #34 from cloudant/explicit_zone_placement
Explicit zone placement
BugzID: 14920
Robert Newson [Thu, 27 Sep 2012 22:47:49 +0000 (23:47 +0100)]
Placement is always specified as a string
Robert Newson [Wed, 26 Sep 2012 23:37:54 +0000 (00:37 +0100)]
Remove cruft
Robert Newson [Wed, 26 Sep 2012 23:38:09 +0000 (00:38 +0100)]
Explicit zone placement
Paul J. Davis [Tue, 25 Sep 2012 18:06:45 +0000 (13:06 -0500)]
Check security objects during internal replication
If we detect that two shards have different values for a security object
during internal replication we automatically trigger a security object
synchronization.
BugzId: 11602
Paul J. Davis [Tue, 25 Sep 2012 17:54:23 +0000 (12:54 -0500)]
Relax the mem3_sync_security fix constraint
Instead of requiring that we only have an empty security object and a
single non-empty version that out numbers empties, we relax the fixable
constraint to be a single simple majority of (N/2)+1 objects of a single
value.
BugzId: 11602
Paul J. Davis [Tue, 25 Sep 2012 22:05:27 +0000 (17:05 -0500)]
Wait for rexi_server before adding a node
The race condition between a nodeup event and rexi_server starting was
causing some superfluous errors. This just waits for rexi_server to boot
before notifying mem3_sync_nodes.
Adam Kocoloski [Fri, 7 Sep 2012 14:28:11 +0000 (07:28 -0700)]
Merge pull request #32 from cloudant/14654-mem3-sync-stuck-replications
Fix stuck internal replications after node down
BugzID: 14654
Paul J. Davis [Fri, 7 Sep 2012 06:57:03 +0000 (01:57 -0500)]
Fix stuck internal replications after node down
We weren't removing entries from the dict tracking what was in the job
queue. This looks like a bug after the switch from tuples to the #job{}
record which means its probably been around for quite awhile. Simple fix
is simply to use the correct dict key.
BugzId: 14654
Adam Kocoloski [Wed, 22 Aug 2012 13:29:18 +0000 (06:29 -0700)]
Merge pull request #31 from cloudant/14348-node-redirects
Configurable redirect of mem3 push jobs
BugzID: 14348
Robert Newson [Tue, 21 Aug 2012 16:31:08 +0000 (17:31 +0100)]
Configurable redirect of mem3 push jobs
In order to faciliate smoother node replacements, mem3 can be
configured to redirect push jobs intended for one node (the failed
one) to another (its replacement). e.g,
[mem3.redirects]
dbcore@db1.foo.cloudant.com = dbcore@db2.foo.cloudant.com
BugzID: 14348
Adam Kocoloski [Wed, 6 Jun 2012 20:58:12 +0000 (16:58 -0400)]
Remove obsolete appup
Adam Kocoloski [Wed, 6 Jun 2012 19:51:58 +0000 (15:51 -0400)]
Merge 'origin/replicator', closes #16
Conflicts:
src/mem3_rep_manager.erl
src/mem3_sup.erl
src/mem3_util.erl
Case 13884
Adam Kocoloski [Tue, 5 Jun 2012 20:19:16 +0000 (16:19 -0400)]
Add upgrade instructions
Adam Kocoloski [Tue, 5 Jun 2012 17:52:32 +0000 (13:52 -0400)]
Export live_shards/2
Adam Kocoloski [Mon, 4 Jun 2012 21:07:36 +0000 (17:07 -0400)]
Remove unused include_lib
Eventually we should insert the validation function automatically.
BugzID: 13780
Adam Kocoloski [Mon, 4 Jun 2012 21:06:46 +0000 (17:06 -0400)]
Avoid #changes_args.db_open_options for compatibility
We'll add it once we deploy the new version of the #changes_args record.
BugzID: 13780
Adam Kocoloski [Mon, 4 Jun 2012 15:11:36 +0000 (11:11 -0400)]
Look for 'deleted' and <<"deleted">> for compatibility
BugzID: 13780
Adam Kocoloski [Mon, 4 Jun 2012 14:58:40 +0000 (10:58 -0400)]
Fix error causing crash on add_node
Adam Kocoloski [Fri, 1 Jun 2012 16:03:55 +0000 (12:03 -0400)]
Merge branch '1.3.x'
Conflicts:
src/mem3.erl
src/mem3_cache.erl
src/mem3_nodes.erl
src/mem3_rep.erl
src/mem3_sup.erl
src/mem3_sync.erl
src/mem3_sync_event.erl
BugzID: 13780
Adam Kocoloski [Fri, 25 May 2012 14:39:47 +0000 (07:39 -0700)]
Merge pull request #30 from cloudant/13606-node-info-ets
Publish node metadata in a protected ets table
BugzID: 13606
Adam Kocoloski [Thu, 24 May 2012 16:43:23 +0000 (12:43 -0400)]
Remove remaining references to #state.nodes
Adam Kocoloski [Tue, 22 May 2012 01:32:22 +0000 (21:32 -0400)]
Publish node metadata in a protected ets table
This allows for far cheaper access to the zone information. The
fallback to gen_server:calls is only for the initial hot upgrade and can
be removed afterwards along with the code_change.
BugzID: 13606
Paul J. Davis [Fri, 11 May 2012 23:31:18 +0000 (18:31 -0500)]
Fix edge condition when loading shards from disk
BugzId: 13386
Adam Kocoloski [Tue, 8 May 2012 17:04:10 +0000 (13:04 -0400)]
Remove custom appup
Paul J. Davis [Tue, 8 May 2012 01:12:51 +0000 (20:12 -0500)]
Don't include deleted dbs in mem3:fold_shards/2
The `delete` option was passed to couch_db:open_doc/2 which ended up
causing deleted databases to be included in mem3:fold_shards/2. This is
counterintuitive so is being removed.
Adam Kocoloski [Mon, 7 May 2012 17:36:20 +0000 (10:36 -0700)]
Merge pull request #29 from cloudant/13511-1.3.x-improve-internal-replicator
Improve internal replicator configuration
BugzID: 13511
Paul J. Davis [Sun, 29 Apr 2012 17:07:52 +0000 (12:07 -0500)]
Improve internal replicator configuration
This work is needed to support the internal replication requirements for
cluster elasticity. This adds three new options:
* batch_size - The number of revisions to replicate in a single batch
* batch_count - The number of batches to replicate. The special value
`all` means to replicate until finished.
* filter - A 1-arity function that takes a #full_doc_info{} record and
returns `keep` or `discard` that determines if that doc should be
included in the replication.
Adam Kocoloski [Thu, 3 May 2012 20:13:55 +0000 (16:13 -0400)]
Put mem3_sync replication exit messages on one line
Adam Kocoloski [Thu, 3 May 2012 17:56:44 +0000 (10:56 -0700)]
Merge pull request #28 from cloudant/13529-reconfigure-ring-on-nodeup
Reconfigure ring on nodeup events
BugzID: 13529
Robert Newson [Thu, 3 May 2012 13:48:16 +0000 (14:48 +0100)]
Deduplicate lookup of special local databases
Adam Kocoloski [Thu, 3 May 2012 13:20:37 +0000 (09:20 -0400)]
Reconfigure ring replications on nodeup
BugzID: 13529
Adam Kocoloski [Wed, 2 May 2012 20:12:03 +0000 (16:12 -0400)]
Reply immediately if we already enqeueued the job
Adam Kocoloski [Wed, 2 May 2012 20:09:03 +0000 (16:09 -0400)]
Really remove the Job from the Q
Adam Kocoloski [Wed, 2 May 2012 19:01:44 +0000 (15:01 -0400)]
Reorder supervision tree to start mem3_nodes earlier
mem3_sync requires mem3_nodes to be running during init.
Adam Kocoloski [Tue, 1 May 2012 18:27:05 +0000 (11:27 -0700)]
Merge pull request #20 from cloudant/13470-make-ushards-zone-aware_master
Reimplement mem3:ushards to honor all 5 properties
BugzID: 13470
Adam Kocoloski [Tue, 1 May 2012 14:52:53 +0000 (10:52 -0400)]
Add upgrade instructions for .4,.6 -> .7
Adam Kocoloski [Tue, 1 May 2012 14:42:54 +0000 (07:42 -0700)]
Merge pull request #27 from cloudant/fix-initial-sync
Add a manager for node syncrhonization
Paul J. Davis [Tue, 1 May 2012 03:23:45 +0000 (22:23 -0500)]
Add a manager for node synchronization
Adam Kocoloski [Tue, 1 May 2012 01:02:02 +0000 (21:02 -0400)]
Don't be dumb about response formats
Adam Kocoloski [Mon, 30 Apr 2012 21:22:33 +0000 (17:22 -0400)]
Guard code_change to prevent future surprises
Adam Kocoloski [Mon, 30 Apr 2012 21:22:05 +0000 (17:22 -0400)]
Remove obsolete appup
Adam Kocoloski [Mon, 30 Apr 2012 18:10:28 +0000 (11:10 -0700)]
Merge pull request #26 from cloudant/13504-mem3-sync-large-tables-master
Optimize mem3 synchronization for large partition tables (master)
BugzID: 13504
Adam Kocoloski [Fri, 27 Apr 2012 19:05:51 +0000 (15:05 -0400)]
Make initial_sync block for each replication
It's low priority, and we don't want to overrun the server. The
implementation is kinda hacky, it sticks the From into #job.pid while
the job is in the waiting queue.
Adam Kocoloski [Fri, 27 Apr 2012 18:07:42 +0000 (14:07 -0400)]
Avoid mesh replication on add_node/nodeup events
BugzID: 13504
Adam Kocoloski [Fri, 24 Feb 2012 16:42:07 +0000 (11:42 -0500)]
Give up on mem3_rep if DB was deleted
Previously we would retry an infinite number of times when this
happened
Adam Kocoloski [Fri, 24 Feb 2012 16:44:50 +0000 (11:44 -0500)]
Sync "dbs" and "_users" with the next live node
The load induced by fully-connected mesh replication increases
quadratically with node count. For large clusters with high rates of
database creation this ends up being significant.
This patch switches the topology to a ring. Each node pushes to the
next live node in the ring. Will deal with the correct action on
'nodeup' events in a separate commit.
Adam Kocoloski [Fri, 27 Apr 2012 17:21:51 +0000 (13:21 -0400)]
Fold over DBs on disk rather than load into memory
This uses the new mem3_shards:fold API to walk the shards from the
on-disk representation.
BugzID: 13504
Adam Kocoloski [Mon, 30 Apr 2012 17:33:00 +0000 (10:33 -0700)]
Merge pull request #24 from cloudant/13504-mem3-sync-large-tables
Optimize mem3 synchronization for large partition tables
BugzID: 13504
Adam Kocoloski [Fri, 27 Apr 2012 19:05:51 +0000 (15:05 -0400)]
Make initial_sync block for each replication
It's low priority, and we don't want to overrun the server. The
implementation is kinda hacky, it sticks the From into #job.pid while
the job is in the waiting queue.
Adam Kocoloski [Fri, 27 Apr 2012 18:07:42 +0000 (14:07 -0400)]
Avoid mesh replication on add_node/nodeup events
BugzID: 13504
Adam Kocoloski [Fri, 27 Apr 2012 19:14:41 +0000 (12:14 -0700)]
Merge pull request #23 from cloudant/13475-mem3-sync-waiting-queue-13x
Backport use of queue in mem3_sync to 1.3.x
BugzID: 13475
Adam Kocoloski [Fri, 24 Feb 2012 16:42:07 +0000 (11:42 -0500)]
Give up on mem3_rep if DB was deleted
Previously we would retry an infinite number of times when this
happened
Adam Kocoloski [Fri, 24 Feb 2012 16:44:50 +0000 (11:44 -0500)]
Sync "dbs" and "_users" with the next live node
The load induced by fully-connected mesh replication increases
quadratically with node count. For large clusters with high rates of
database creation this ends up being significant.
This patch switches the topology to a ring. Each node pushes to the
next live node in the ring. Will deal with the correct action on
'nodeup' events in a separate commit.
Adam Kocoloski [Fri, 27 Apr 2012 17:21:51 +0000 (13:21 -0400)]
Fold over DBs on disk rather than load into memory
This uses the new mem3_shards:fold API to walk the shards from the
on-disk representation.
BugzID: 13504
Bob Dionne [Mon, 23 Apr 2012 20:11:05 +0000 (16:11 -0400)]
Optimize next_replication by using queue:out
Bob Dionne [Wed, 18 Apr 2012 16:38:40 +0000 (12:38 -0400)]
Replace waiting list in state with queue for improved performance
BugzID:13475
Adam Kocoloski [Tue, 24 Apr 2012 17:43:41 +0000 (13:43 -0400)]
Update appup
BugzID: 13469
Adam Kocoloski [Tue, 24 Apr 2012 17:42:29 +0000 (13:42 -0400)]
Merge remote-tracking branch 'origin/13469-internal-rep-fix' into 1.3.x
Adam Kocoloski [Tue, 24 Apr 2012 17:06:38 +0000 (13:06 -0400)]
Customize upgrade instructions for .3 -> .4
Adam Kocoloski [Tue, 24 Apr 2012 16:37:00 +0000 (09:37 -0700)]
Merge pull request #19 from cloudant/13414-mem3-cache-lru
BugzID: 13414
Adam Kocoloski [Tue, 24 Apr 2012 16:34:56 +0000 (09:34 -0700)]
Merge pull request #22 from cloudant/13486-fix-zone-shuffle_1.3.x
BugzID: 13486
Adam Kocoloski [Tue, 24 Apr 2012 16:34:17 +0000 (09:34 -0700)]
Merge pull request #21 from cloudant/13486-fix-zone-shuffle_master
BugzID: 13486
Robert Newson [Tue, 24 Apr 2012 15:14:14 +0000 (16:14 +0100)]
Use rand_uniform to fix deterministic zone placement
BugzID: 13486
Robert Newson [Tue, 24 Apr 2012 15:14:14 +0000 (16:14 +0100)]
Use rand_uniform to fix deterministic zone placement
BugzID: 13486
Robert Newson [Mon, 23 Apr 2012 22:38:57 +0000 (23:38 +0100)]
Refactor for testability and reduce node_info calls
Paul J. Davis [Tue, 27 Mar 2012 10:36:45 +0000 (05:36 -0500)]
Replace mem3_cache with mem3_shards
This change is to address the unbounded cache that existed in
mem3_cache. The module name has been renamed to mem3_shards and it now
contains an API for accessing shards in the cache so that hits can be
registered with the cache.
The cache itself is now a proper LRU with a bounded size. The other
behavior change is to let the cache warm after start instead of
preloading it with the entire contents of the shards database. This
means that shards will be inserted as they are read from disk which
introduces a period of cold cache when the node boots.
BugzId: 13414