如何使用正则表达式将特定子字符串提取到新行中? admin 2023-07-17 11:18:02 技术标签: 【中文标题】如何使用正则表达式将特定子字符串提取到新行中?【英文标题】:How to extract specific substrings into new rows, using regex? 【发布时间】:2020-03-07 09:08:57 【问题描述】: 我有一个数据框,其中包含用户和客户代理之间的完整聊天。我想只提取来自用户的消息并从中创建具有相同票证 ID 的新行:ticket_id = pd.DataFrame(["1","2"]).rename(columns=0:"Ticket-ID")full_chat = pd.DataFrame([ "User foo foo foo 12:12 PM, Agent bar bar bar 12:12 PM, User foo foo 12:13 PM, Agent bar bar 12:13 PM, User foo 12:14 PM, Agent bar 12:14 PM", "User bar bar bar 12:12 PM, Agent foo foo foo 12:12 PM, User bar bar 12:13 PM" ]).rename(columns=0:"Full-Chat")merge_chat = pd.merge(ticket_id, full_chat, left_index=True, right_index=True, how="outer")def _split_row(text): cleaned_text = text.lower() lines = re.findall(r"w*user (.*?) *dd:dd*", cleaned_text) for line in lines: print(line.split())print(merge_chat["Full-Chat"].apply(_split_row))我希望它是这样的:Ticket-ID Full-Chat1 foo foo foo1 foo foo1 foo2 bar bar bar2 bar bar 【问题讨论】: 【参考方案1】: IIUC,merge_chat["Full-Chat"] = merge_chat["Full-Chat"].apply(lambda i: re.findall(r"w*user (.*?) *dd:dd*", i.lower()))从 Pandas 0.25.0 开始,merge_chat.explode(column="Full-Chat")会给你结果在 0.25.0 之前的版本中,df = pd.DataFrame(merge_chat["Full-Chat"].tolist(), index=merge_chat["Ticket-ID"]).stack()df = df.reset_index([0, "Ticket-ID"])df.rename(columns=0:"Full-Chat", inplace=True)df Ticket-ID Full-Chat0 1 foo foo foo1 1 foo foo2 1 foo3 2 bar bar bar4 2 bar bar 【讨论】:【参考方案2】: 我对此进行了测试,它可以工作ticket_id = pd.DataFrame(["1","2"]).rename(columns=0:"Ticket-ID")full_chat = pd.DataFrame(["User foo foo foo 12:12 PM, Agent bar bar bar 12:12 PM, User foo foo 12:13 PM, Agent bar bar 12:13 PM, User foo 12:14 PM, Agent bar 12:14 PM", "User bar bar bar 12:12 PM, Agent foo foo foo 12:12 PM, User bar bar 12:13 PM"]).rename(columns=0:"Full-Chat")merge_chat = pd.merge(ticket_id, full_chat, left_index=True, right_index=True, how="outer")Output_df = pd.DataFrame(columns = ["Ticket-ID","Full-Chat"])def split_row(text,ticket_id): cleaned_text = text.lower() lines = re.findall(r"w*user (.*?) *dd:dd*", cleaned_text) return_df = pd.DataFrame(columns = ["Ticket-ID","Full-Chat"]) for line in lines: New_row = pd.DataFrame("Ticket-ID":[ticket_id],"Full-Chat":[line]) return_df = return_df.append(New_row) return return_dffor index, row in merge_chat.iterrows(): Output_df = Output_df.append(split_row(row["Full-Chat"],row["Ticket-ID"]))Output_df=Output_df[["Ticket-ID", "Full-Chat"]].reset_index(drop=True)Output_df.head()输出: Ticket-ID Full-Chat0 1 foo foo foo 1 1 foo foo 2 1 foo 3 2 bar bar bar 4 2 bar bar 【讨论】:以上是关于如何使用正则表达式将特定子字符串提取到新行中?的主要内容,如果未能解决你的问题,请参考以下文章 js中文转码url问题 VMware的虚拟机为啥ip地址老是自动变化 您可能还会对下面的文章感兴趣: 相关文章 浏览器打不开网址提示“ERR_CONNECTION_TIMED_OUT”错误代码的解决方法 如何安装ocx控件 VMware的虚拟机为啥ip地址老是自动变化 vbyone和EDP区别 linux/debian到底怎么重启和关机 苹果平板键盘被弄到上方去了,如何调回正常? 机器学习常用距离度量 如何查看kindle型号