A data frame is a method for storing data in rectangular grids for easy overview. If you have knowledge of
java development and R basics, then you must be aware of the data frames. The measurements or values of an instant corresponds to the rows in the grid whereas the vectors containing data for a specific variable represent the column. Hence, the rows in the data frame can include values like numeric, character, logical and so on. Similar is the data frame in Python, which is labeled as two-dimensional data structures having different types of columns. The Python Pandas data frame consists of the main three principal components, namely the data, index and the columns.
When it comes to data management in Python, you have to begin by creating a
data frame. It is one of the easiest tasks to do. You can also add the parameters.
After creating the data frame, we shall proceed to know how to select, add or delete an index or column from it. To perform all these actions, first of all, you need to select a component from the Python data frame.
Select Index, Row or Column
Let us assume that you have a data frame as given below and you want to access the value at index 0 for column A.
A B C
0 1 2 3
1 4 5 6
2 7 8 9
You can access the values by a variety of options.
If you wish to select the rows or columns you can select rows by passing row label to a loc function, which gives the output shown below:
one 2.0
two 2.0
Name: b. dtype: float64
In another way, you can select a row by passing integer location to an iloc function as given here.
import pandas as pd
d = {‘one’ : pd.Series([1, 2, 3], index=[‘a’, ’b’, ‘c’]),
‘two’ : pd.Series([1, 2, 3, 4], index=[‘a’, ‘b’, ‘c’, ‘d’])}
df = pd.DataFrame(d)
print df.loc[‘b’]?
This displays the following output.
one 3.0
two 3.0
Name: c, dtype: float64
There is a difference between
loc and iloc function for indexing attributes. While the .loc works on your index labels, .iloc works on the position of your index.
Add an Index, Row, or Column
To assign the ‘index' argument to the input, ensure that you get the selected index. If nothing is specified in the data frame, by default, it will have a numerically valued index beginning from 0. You can make your index by calling set_index() on your data frame and re-use them.
Let us look at the example given in order to add the columns to your data frame in the same way you add rows.
import pandas as pd
d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
'two' : pd.Series([1, 2, 3], index=['a', 'b', 'c'])}
df = pd.DataFrame(d)
# Adding a new column to an existing DataFrame object with column label by passing new
print ("Adding a new column by passing as Series:")
df['three']=pd.Series([10,20,30],index=['a','b','c'])
print df
print ("Adding a new column using the existing columns in DataFrame:")
df['four']=df['one']+df['three']
print df
It delivers the output in the following manner given below.
Delete Index, Row, or Column
Every data frame has an index, so you should think before you delete. You can also reset your index if you do not like the way it is displaying by simply using the .reset_index() command. Similarly, you can use the drop() method to delete columns and also set in place to True to delete the column without reassigning the Python Frame.
# Using the previous DataFrame, we will delete a column
# using del function
import pandas as pd
d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd']),
'three' : pd.Series([10, 20, 30], index=['a', 'b', 'c'])}
df = pd.DataFrame(d)
print ("Our dataframe is:")
print df
# using del function
print ("Deleting the first column using DEL function:")
del df['one']
print df
# using pop function
print ("Deleting another column using POP function:")
df.pop['two']
print df
By entering the above commands, it displays the following output.
You need to execute df.drop_duplicates() to remove duplicate rows from your data frame. In case, there are no duplicates, you can use the drop() method to remove the rows from your data frame.
# Check out the DataFrame ‘df’
print(_)
# Drop the index at position 1
df.____(df.index[_])?
The Pandas Python also lets you do a variety of tasks in your data frame. You can rethink it like a spreadsheet or SQL table or a series object. Pandas Python has many powerful implications so you should now understand how they work and when they are useful for your data frame next time.