Monday 1 November 2010

P4toDB README

NOTE: The definitive README for your version of P4toDB can be found in your downloaded zip file.

P4toDB is a tool that retrieves existing and updated metadata from the Perforce server and translates it into SQL statements for replay into a third-party SQL database.

Perforce server -> P4toDB -> SQL server

The metadata is low-level and is in a form similar to Perforce server checkpoint files. Textual translation is done in some cases to make the fields look more like the format produced by a Perforce client program, or what would be more natural in an SQL query.

P4toDB creates SQL tables that map directly to Perforce database files. For example, the database file that stores submitted revision information is db.rev, which becomes the rev table in the SQL database. P4toDB also creates SQL views as a convenience for querying Perforce branches, clients, labels and jobs. The jobs view is updated automatically when the Perforce server's jobspec is modified.

P4toDB talks by way of TCP/IP to the Perforce server, and via whatever method the target database's JDBC driver uses. This will usually also be TCP/IP.

P4toDB, the Perforce server, and the SQL database can all be run on the same or separate machines. For performance reasons, the machines should all be separate.

This article is divided into the following sections for your convenience:

To install and configure P4toDB, you will need:

A Perforce server at version 2009.2 or higher. A Perforce account with "super" privileges. A TCP/IP connection between P4toDB and the Perforce server. Uncompressed checkpoints/journals on the Perforce server. The checkpoints must be at the same version level as the live database and the Perforce server. A full standard JRE 6 or later on the following operating systems: Linux, Unix, Mac OS X or Windows XP and later. P4toDB is a Java program and requires this to run. A supported target SQL database:  Oracle 11, Microsoft SQL Server 2005, MySQL 5 (with InnoDB), IBM DB2 9, Postgres 8, or Derby 10.6.1 / JavaDB 10.5.3.A JDBC version 4 driver for one of the supported SQL databases. The ability to create accounts on and configure the SQL database. The database should be created with the UTF-8 character set and a case-sensitive collation. CREATE TABLE/VIEW, SELECT, UPDATE, INSERT and DELETE privileges on the target SQL database. The JDBC driver may require others. A bare minimum of 256 MB of RAM dedicated to P4toDB. Normal usage will require at least 512 MB. In extreme cases, P4toDB can use more than 4 GB.  Performing a trial run with a small Perforce database is recommended.

P4toDB can place a heavy load on the SQL database, so the aid of a database administrator (DBA) might be required to properly configure the target server. The default settings on the SQL database might not be sufficient, depending on the size of the Perforce server's metadata and the chosen P4toDB settings.

P4toDB is packaged as a zip archive file containing the program itself, its platform-specific launch-scripts, dependencies and documentation. It is left to the user to obtain the JDBC driver. Links to supported drivers are provided below, in the JDBC Driver Info section. See also, the Known Issues section for known limitations or bugs.

The name and contents of the zip archive are in the form:

p4todb-release.major.minor.patch/ p4todb.bat p4todb.sh p4todb.jar README.txt lib/ p4java.jar

The lib directory is where the JDBC drivers and their supporting files will be placed. The base directory will subsequently be referred to as $P4TODB_HOME.

To install P4toDB:

Extract the distribution archive and execute P4toDB to view its help output: unzip p4todb.zipcd $P4TODB_HOMEjava -jar p4todb.jar

The last command displays the "help" screen for P4toDB.

P4toDB gets its configuration from the following places:

The Java runtime environment (heap size, PermGen space). Optional P4toDB-specific defines passed in through the JRE. The P4toDB configuration file.

The configuration file is the primary place settings are stored. There can be multiple configuration files - one for each running instance of P4toDB.

Get the JDBC version 4 driver for your target database, and put it in the $P4TODB_HOME/lib directory. Prior JDBC versions will not be recognized by P4toDB.

Make note of the driver's Java class name. P4toDB loads the driver based on the class name. P4toDB gets its SQL database credentials from the JDBC driver connection URL. The form of this URL is vendor-specific, and described in the driver's documentation.

Note: The JDBC Driver Information section contains the java class names for the major JDBC drivers.

Get a configuration file template to customize. The template is self-documenting, and is created with the command: java -jar p4todb.jar --sample-config > p4server.properties 

Make sure the target SQL database exists and is configured.

The syntax for this is vendor-specific, but is something like:

CREATE DATABASE db_name;It is recommended to create the database using the UTF-8 character set. This will avoid any data corruption when translating data from the Perforce server. P4toDB connects to the Perforce server using the utf8 charset. In order for P4toDB to load the the code in the library directory $P4TODB_HOME/lib, the Java optional packages directory variable java.ext.dirs must be modified. The library directory should be prepended to it. This is handled automatically by the platform-specific launchers p4todb.bat and p4todb.sh.

P4toDB is now ready to initialize the database for replication.

Note: The executable bit for the p4todb.sh script should be set:

chmod +x p4todb.shTo inspect the DDL that would be used for this configuration before P4toDB commits it, print it out with the following:

Windows p4todb.bat --config-file p4server --print-ddl

Unix

./p4todb.sh --config-file p4server --print-ddl An index is created for every key in each replicated table. This might not always be optimal.

To turn off index creation or to create the schema without proceeding with checkpoint conversion, see the dont_create_indexes and init_and_quit flags in the Optional Configuration Defines section.

Initialize the replication instance with a specific checkpoint number:

Windows

p4todb.bat --config-file p4server --checkpoint 123 --one-shot

Unix

./p4todb.sh --config-file p4server --checkpoint 123 --one-shot  When this command completes, the P4toDB log file should be inspected to ensure that there were no errors. The --one-shot flag instructs P4toDB to only process a single file at a time, then quit. Note that when run on the live journal, the process will not exit until the journal has been rotated.

Note: The checkpoint you use must have been created with the same version of the Perforce server that is currently in use. If this is not the case, a runtime error might occur midway through processing the checkpoint.

Note: The checkpoint must be located in the Perforce server root (P4ROOT) directory. If you create checkpoints in another location, or move the checkpoints after creating them, you will need to relocate the checkpoint used to initialize P4toDB to the Perforce server root directory.

The target SQL database is now populated with data. To re-start replication after quitting, launch P4toDB with only the configuration file option. This is the recommended default invocation:

Windows

p4todb.bat --config-file p4server

Unix

./p4todb.sh --config-file p4serverP4toDB stores the state of replication in the SQL database, so a checkpoint number is only required for initial database population. The saved state includes the current journal number, the position, and the last update time.

P4toDB can safely be closed at any time after SQL table/view creation, as the target database consistency is protected by SQL transactions.

P4toDB defaults to stay-alive polling, where it follows from checkpoint to journal, and from rotated journal to journal up until the live one where it waits for new metadata to be written:

checkpoint.N -> journal.N -> journal.N+1 -> ... -> journal

If explicit control over which journal number to use is required, use the --force flag along with --journal jnl_num to override the state saved in the target database with the one provided. This should be used with care and is not normally needed. Example:

p4todb.bat --config-file p4server --force --journal 123

P4toDB allows for replicating the metadata from multiple source Perforce servers into a single SQL database:

P4D_1 -> P4toDB_1 -> \ P4D_2 -> P4toDB_2 -> SQL Database /P4D_3 -> P4toDB_3 ->

To do this, customize a P4toDB configuration file for each source Perforce server, and launch an instance of P4toDB for each server. P4toDB prepends the specified replication prefix in each of the configuration files to the name of the tables it creates.

For example, a prefix of sw_ makes the users table into sw_users.

The schema for the Perforce server at a given release is available through the web site. The URL is of the following format, where XYZ is the number of the release:

http://www.perforce.com/perforce/doc.XYZ/schema/ 092 <- 2009.2

There might be small discrepancies between the schema and what is generated by P4toDB. When in doubt, look to see what P4toDB created in the SQL database. The online schema should be a good place to start when learning about how the data is organized.

P4toDB translates Perforce data types into those of the target SQL database according to the output of p4 dbschema, the secondary formatting of the metadata, and any translations done to make querying more natural in SQL.

Converted types of note are:

changelists: a value of zero refers to the default pending changelist. MaxResults, MaxScanRows, MaxLockTime, and timeout: "unset" is 0, "unlimited" is -1.

When the schema on the source Perforce server changes, the replicated database must either be dropped and re-imported from scratch, or must have the appropriate ALTER TABLE statements run. DDL to move between schema versions will be supplied on a per-release basis.

New versions of the Perforce server do not necessarily imply changes in the schema. If they do, the changes often only affect a few of the tables. If the replicated tables in the target SQL database share the same table version as the new Perforce server, then no upgrading needs to be done.

P4toDB performance is largely dependent upon the speed of the target RDBMS, especially during the initial checkpoint import. The import is an INSERT for each record in the checkpoint.

If the initial checkpoint import takes a long time, it is recommended that you take a backup or an SQL dump of the database when the import is complete using the SQL version of the Perforce server's checkpoint. This technique allows for the use of the database's bulk-import programs, should a restore be necessary. Bulk importers will always be much faster than normal client programs.

Another way to improve performance is to only replicate a subset of the available tables. Some of the tables in Perforce contain data duplicated from other tables and are redundant to include in the SQL database.

These are key/value pairs passed into P4toDB via the JRE. For example:

java -Dvariable=value ... -jar p4todb.jar

There must not be any spaces between -D, the key, and its value. The following error will result if spaces appear:

Exception in thread "main" java.lang.NoClassDefFoundError

The value must be defined along with the key. Values are either the number one ("1"), or true. Not defining a key will make P4toDB use its default behavior. The keys listed below must all be preceded by "p4todb.", similar to p4todb.print_drivers. For example:

java -Dp4todb.print_drivers=1 -Djava.ext.dirs=lib -jar p4todb.jar --config-file cfg --print-ddl

These options are intended as transient configuration to influence per-invocation behavior. Valid defines are:

Exit after creating database tables.Print the DML created for a connection.Print the JDBC connection metadata.Print the names of available JDBC drivers.Print out the JRE system environment variables.When combined with --journal, --force and --one-shot, allows for replaying Perforce server journal patches, as these are not themselves journaled when you apply them. This variable is an exception, in that you pass it as a string which is the name of the journal patch.

The name of the patch must be of the form name.jnl.number, where the name and the number are variables to change. The number must be less than the current journal number.  For example, patch.jnl.0. It is good practice to keep any journal patches around for future use.

Here is an example invocation for a patch named jnl_patch.jnl.0:

java [other opts] -Dp4todb.tmp_jnl_prefix=jnl_patch -jar p4todb.jar --force --journal 0 --one-shot

Example queries are available in the .zip distribution, in the file queries.txt.

We are interested in hearing feedback about your experience using P4toDB. You can send any feedback you have to Perforce Support or leave feedback at the bottom of this article. We are particularly interested in the following sorts of questions:

What sort of things are you doing with the replicated metadata? Would you be interested in sharing your queries, or programs you have written to use the data? Was the documentation clear? Was there anything missing? How easy is it to use P4toDB? Could it be improved? Anything else you wouldd like to share.

Thank you for your help in making P4toDB a better product!

P4Report is a stateless stand-alone client application with its own SQL engine. Although this approach has its benefits, it is mainly suited for light-weight ad-hoc reporting tasks; its performance can be an issue, and user-access to metadata is controlled by the Perforce server's protections table.

P4toDB provides for more accessible data, as it allows the user to take advantage of existing third-party tools that work with the target SQL database. It does not make large demands of the Perforce server, and user access to metadata is not governed by the Perforce server's protection table, but rather by administrator-specified rights on the SQL database.

DB2:

DRIVER = com.ibm.db2.jcc.DB2DriverURL = jdbc:db2://server_host:port/db_name:user=db_user;password=db_pass;driverType=4;

The DB2 driver is named db2jcc4.jar, and requires db2jcc_license_cu.jar. Both must be placed in the lib directory. The driver's name is the "DB2 Universal Driver", and can be found in the DB2 "Application Development Client", or in the DB2 server install.

IBM DB2 Software:

http://www-01.ibm.com/software/data/db2/

Installing the IBM DB2 Driver for JDBC and SQLJ:

http://publib.boulder.ibm.com/infocenter/db2luw/v9/index.jsp?topic=/com.ibm.db2.udb.apdv.java.doc/doc/t0010264.htm

Properties for the IBM DB2 Driver for JDBC and SQLJ:

http://publib.boulder.ibm.com/infocenter/db2luw/v9/index.jsp?topic=/com.ibm.db2.udb.apdv.java.doc/doc/rjvdsprp.htm

Programming JDBC Applications:

http://publib.boulder.ibm.com/infocenter/db2luw/v8/index.jsp?topic=/com.ibm.db2.udb.doc/ad/samples/jdbc/develop_jdbc.htm

DRIVER = org.apache.derby.jdbc.ClientDriverURL = jdbc:derby://server[:port]/databaseName[;URLAttributes=value[;...]]

Overview:

http://www.oracle.com/technetwork/java/javadb/overview/index.html

http://db.apache.org/derby/

DRIVER = com.microsoft.sqlserver.jdbc.SQLServerDriverURL = jdbc:sqlserver://server_host:port;databaseName=db_name;user=db_user;password=db_pass;

Microsoft SQL Server JDBC Driver:

http://msdn.microsoft.com/en-us/data/aa937724.aspx

Overview of the JDBC Driver:

http://msdn.microsoft.com/en-us/library/ms378749.aspx

Frequently Asked Questions (JDBC):

http://msdn.microsoft.com/en-us/data/cc325721.aspx

Setting the Connection Properties:

http://msdn.microsoft.com/en-us/library/ms378988.aspx

Building the Connection URL:

http://msdn.microsoft.com/en-us/library/ms378428(SQL.90).aspx

DRIVER = com.mysql.jdbc.DriverURL = jdbc:mysql://server_host/db_name?user=db_user&password=db_pass

MySQL Connector/J:

http://dev.mysql.com/doc/refman/5.1/en/connector-j.html

Driver/Datasource Class Names, URL Syntax and Configuration Properties for Connector/J:

http://dev.mysql.com/doc/refman/5.1/en/connector-j-reference-configuration-properties.html

Using Character Sets and Unicode:

http://dev.mysql.com/doc/refman/5.0/en/connector-j-reference-charsets.html

Common Problems and Solutions:

http://dev.mysql.com/doc/refman/5.1/en/connector-j-usagenotes-troubleshooting.html

DRIVER = oracle.jdbc.driver.OracleDriverURL = jdbc:oracle:thin:db_user/db_pass@server_host:port/db_name

Use the "Thin" driver.

JDBC, SQLJ, Oracle JPublisher and Universal Connection Pool (UCP) JDBC Driver Downloads:

http://www.oracle.com/technology/software/tech/java/sqlj_jdbc/index.html

Oracle JDBC FAQ:

http://www.oracle.com/technology/tech/java/sqlj_jdbc/htdocs/jdbc_faq.html

JDBC:

http://wiki.oracle.com/page/JDBC

Configuration file values: DRIVER = org.postgresql.DriverURL = jdbc:postgresql://server_host:port/db_name?user=db_user&password=db_pass

PostgreSQL JDBC Driver:

http://jdbc.postgresql.org/

Connecting to the Database:

http://jdbc.postgresql.org/documentation/83/connect.html

JDK 6 Java Database Connectivity (JDBC)-related APIs & Developer Guides:

http://java.sun.com/javase/6/docs/technotes/guides/jdbc/

JDBC Basics:

http://java.sun.com/docs/books/tutorial/jdbc/basics/index.html

JDBC Driver Search:

http://devapp.sun.com/product/jdbc/drivers

java.sql (Java Platform SE 6) API Reference:

http://java.sun.com/javase/6/docs/api/java/sql/package-summary.html

P4toDB:

If the SQL database is not configured to use UTF-8 and the data being replicated contains Unicode, the database might get corrupted. If the SQL database is not set to a case-sensitive collation, data that only differs by case could be incorrectly deleted. For example, for two Perforce jobs, one named "bug" and the other "BUG", deleting "bug" also removes "BUG" from the SQL side.

Binary data in the traits table created by the undocumented p4 attribute command is incorrectly converted as a string, garbling the data.

If the jobs view is created and an exception occurs in the same transaction, the rollback will not remove the view. Restarting replication after this happens will fail. Manually drop the view if this happens. This is rare.

If querying the jobs view returns an error such as the following (in MySQL):

ERROR 1242 (21000): Subquery returns more than 1 rowThere are probably jobs whose names only differs by case. Either delete the offending jobs or (in MySQL) make the key binary to force a case-sensitive comparison. For example: ALTER TABLE bodtext MODIFY `key` VARCHAR(2000) CHARACTER SET binary;

The number of journal records to fetch from the Perforce server must not be reduced to a number below the value stored in the saved state.

If the initialization with a checkpoint fails, check the log for entries similar to:

Feb 10, 2010 5:42:56 PM com.perforce.p4todb.Messages logSEVERE: Perforce error:  java.lang.OutOfMemoryError: Java heap spaceFeb 10, 2010 5:42:56 PM com.perforce.p4todb.Messages logSEVERE: Perforce error:  com.perforce.p4java.exception.ConnectionException: Java heap spaceThis can happen on Mac OS X systems most frequently due to a small default heap space, but it can also occur on other Unix systems as well. To fix the issue, edit p4todb.sh and uncomment the line: USER_JOPTS="-server -d64 -Xmx1G"

This raises the heap size to 1GB (-Xmx1G) and uses 64-bit addressing. The syntax of this line is:

USER_JOPTS="-server -d{32 or 64} -Xmx{memory amount}{M or G}"

For example, to set it for a 32-bit system and 512 megabytes:

USER_JOPTS="-server -d32 -Xmx512M"

Note: You will likely need to drop the partially created database to re-run the command to initialize P4toDB from the checkpoint. For most SQL systems the syntax will be similar to:

DROP DATABASE db_name;Then recreate the database with the same name.

Perforce text data types are converted as large objects (CLOB, LONGTEXT). Although many of these fields are typically used in Perforce to store small amounts of data (such as the email address in the user spec), it is possible to store more there than what will fit in a VARCHAR, so for correctness, the large object is used.

The combined size of keys in an MS SQL index can be up to a maximum of 900 bytes, which is smaller than what is needed to index the 1024 byte maximum Perforce path/identifier length.  Some Perforce server tables have multiple keys that long. While unusual in practice, if the to-be-replicated data exceeds this size, a runtime failure will happen similar to:

Operation failed. The index entry of length 1702 bytes for the index'domain_name_index' exceeds the maximum length of 900 bytes.SQL state: S0003, Vendor error code: 1946

When using the default identifier length, SQL warnings will be displayed at the end of the checkpoint conversion.

The size of individual column entries in a Perforce table can exceed the default server value for max_allowed_packet. If this situation is encountered, replication will exit with an error. Increasing the value and restarting both the MySQL server and P4toDB will remedy this problem.

The default bufferpool and tablespace pagesize is not large enough to create some of the Perforce tables because the rows can be long - for example, db.working. To avoid this problem, you must have a tablespace with a pagesize of 32K. A bufferpool size of 3000 is known to work.

The default transaction log size might not be large enough to handle the size of commits P4toDB uses. Either configure DB2 to increase the transaction log size, or specify a smaller number of journal records for P4toDB to fetch at a time. Errors like the following are indicative of this problem:

Batch failure. The batch was submitted, but at least one exceptionoccurred on an individual member of the batch. ERRORCODE=-4228Error for batch element #1: The current transaction was rolled backbecause of error "-964".. SQLCODE=-1476, SQLSTATE=40506Non-recoverable chain-breaking exception occurred during batch processing.The batch is terminated non-atomically. ERRORCODE=-4225

If the database is not created with the UTF-8 charset, an SQL warning with an SQLSTATE of 01517 might appear during conversion of unicode data.

When using identity columns, harmless warnings are shown when creating tables:

CREATE TABLE will create implicit sequence "BODTEXT_BODTEXTID_seq"for serial column "QA_BODTEXT.TRUEBODTEXTID"

View the original article here

No comments:

Post a Comment