Displaying 1 to 19 from 19 results

evidently - Interactive reports to analyze machine learning models during validation or production monitoring

  •    Jupyter

Interactive reports and JSON profiles to analyze, monitor and debug machine learning models. Evidently helps evaluate machine learning models during validation and monitor them in production. The tool generates interactive visual reports and JSON profiles from pandas DataFrame or csv files. You can use visual reports for ad hoc analysis, debugging and team sharing, and JSON profiles to integrate Evidently in prediction pipelines or with other visualization tools.

sparkmagic - Jupyter magics and kernels for working with remote Spark clusters

  •    Python

Sparkmagic is a set of tools for interactively working with remote Spark clusters through Livy, a Spark REST server, in Jupyter notebooks. The Sparkmagic project includes a set of magics for interactively running Spark code in multiple languages, as well as some kernels that you can use to turn Jupyter into an integrated Spark environment. There are two ways to use sparkmagic. Head over to the examples section for a demonstration on how to use both models of execution.

Marketstore - DataFrame Server for Financial Timeseries Data

  •    Go

MarketStore is a database server optimized for financial timeseries data. You can think of it as an extensible DataFrame service that is accessible from anywhere in your system, at higher scalability. It is designed from the ground up to address scalability issues around handling large amounts of financial market data used in algorithmic trading backtesting, charting, and analyzing price history with data spanning many years, including tick-level for the all US equities or the exploding crypto currencies space. If you are struggling with managing lots of HDF5 files, this is perfect solution to your problem.




pdpipe - Easy pipelines for pandas DataFrames.

  •    Python

Easy pipelines for pandas DataFrames. Some pipeline stages require scikit-learn; they will simply not be loaded if scikit-learn is not found on the system, and pdpipe will issue a warning. To use them you must also install scikit-learn.

influxdb-client-python - InfluxDB 2.0 python client

  •    Python

Note: Use this client library with InfluxDB 2.x and InfluxDB 1.8+. For connecting to InfluxDB 1.7 or earlier instances, use the influxdb-python client library. The API of the influxdb-client-python is not the backwards-compatible with the old one - influxdb-python.

PandasToPowerpoint - Python utility to take a Pandas DataFrame and create a Powerpoint table

  •    Python

Converts a Pandas DataFrame to a PowerPoint table on the given Slide of a PowerPoint presentation. The table is a standard Powerpoint table, and can easily be modified with the Powerpoint tools, for example: resizing columns, changing formatting etc.

pymarketstore - Python driver for MarketStore

  •    Python

Construct a client object with endpoint. You can build parameters using pymkts.Params.


biopandas - Working with molecular structures in pandas DataFrames

  •    Python

If you are a computational biologist, chances are that you cursed one too many times about protein structure files. Yes, I am talking about ye Goode Olde Protein Data Bank format, aka "PDB files." Nothing against PDB, it's a neatly structured format (if deployed correctly); yet, it is a bit cumbersome to work with PDB files in "modern" programming languages -- I am pretty sure we all agree on this.

s3bp - Read and write Python objects to S3, caching them on your hard drive to avoid unnecessary IO.

  •    Python

Read and write Python objects from/to S3, caching them on your hard drive to avoid unnecessary IO. Special care given to pandas dataframes. The boto3 package itself requires that you have an AWS config file at ~/.aws/config with your AWS account credentials to successfully communicate with AWS. Read here on how you can configure it.

bevel - Ordinal regression in Python

  •    Python

Ordinal regression refers to a number of techniques that are designed to classify inputs into ordered (or ordinal) categories. This type of data is common in social science research settings where the dependent variable often comes from opinion polls or evaluations. For example, ordinal regression can be used to predict the letter grades of students based on the time they spend studying, or Likert scale responses to a survey based on the annual income of the respondent. In People Analytics at Shopify, we use ordinal regression to empower Shopify employees. Our annual engagement survey contains dozens of scale questions about wellness, team health, leadership and alignment. To better dig into this data we built bevel, a repository that contains simple, easy-to-use Python implementations of standard ordinal regression techniques.

dataframe-go - DataFrame for statistics and data manipulation

  •    Go

Dataframes are used for statistics and data manipulation. You can think of a dataframe as an excel spreadsheet. This package is designed to be light-weight and intuitive. The package is production ready but the API is not stable yet. Once stability is reached, version 1.0.0 will be tagged. It is recommended your package manager locks to a commit id instead of the master branch directly.

ult - Super Fast Geospatial Aggregation Library

  •    Python

Geospatial problems are hard, especially when having to relatee a set of geometries (lines or polygons) to a substantial set of points, traditional methodologies essentially are infeasible over 100k + points without huge computing time lost. The solution is pretty simple remove geometry entirely from the operation. Like many things in CS this trades precomputation time of the indexs with the performence boost of using said indicies. The picture above gives some insight on how the lines algorithm works internally.

quickviz - Visualize a pandas dataframe in a few clicks

  •    Jupyter

Quickviz provides widgets for quickly visualizing pandas dataframes. It interfaces with seaborn and pandas.plot. See the gallery (which is also a test suite) for more.

pydbgen - Random dataframe and database table generator

  •    Python

While it is easy to generate random numbers or simple words for Pandas or dataframe operation learning, it is often non-trivial to generate full data tables with meaningful yet random entries of most commonly encountered fields in the world of database, such as name, age, birthday, credit card number, SSN, email id, physical address, company name, job title etc. This Python package generates a random database TABLE (or a Pandas dataframe, or an Excel file) based on user's choice of data types (database fields). User can specify the number of samples needed. One can also designate a "PRIMARY KEY" for the database table. Finally, the TABLE is inserted into a new or existing database file of user's choice.

yahoo-historical - Downloads historical EOD (end of day) prices from yahoo finance

  •    Python

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

delta-rs - A native Rust library for Delta Lake, with bindings into Python and Ruby.

  •    Rust

A native interface to Delta Lake. This library provides low level access to Delta tables in Rust, which can be used with data processing frameworks like datafusion, ballista, rust-dataframe, vega, etc. It also provides bindings to other higher level languages such as Python, Ruby, or Golang.

pandas-validator - Validation Library for pandas' DataFrame and Series.

  •    Python

Validates the pandas object such as DataFrame and Series. And this can define validator like django form class. When we wrangle our data with pandas, We use DataFrame frequently. DataFrame is very powerfull and easy to handle. But DataFrame has no it's schema, so It allows irregular values without being aware of it. We are confused by these values and affect the results of data wrangling.






We have large collection of open source products. Follow the tags from Tag Cloud >>


Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.