Data Science
Pandas
Pandas is a Python library that allows you to easily manipulate data meant to be analyzed.
It is possible to manipulate data tables with labels of variables (columns) and individuals (rows). These arrays are called DataFrames, similar to dataframes under R. These data frames can be easily read and written from a tabular file or vice versa. Graphs can be easily drawn from these DataFrames using matplotlib.
Why choose Pandas?
Using pandas you can:
- Retrieve data from CSV files, Excel tables, web pages, HDF5, etc.
- Group, cut, lighten, move, write data; these data can be one or two dimensions, with gaps, or temporal with or without periodicity
- As long as the data is correctly formatted, pandas can get the job done even if the quantity exceeds your machine’s capacity by treating the sources piece by piece. The development of this library is part of the problem of having tools to handle large volumes of data for the purpose of their scientific or commercial exploitation